Open
Description
Describe the feature request
Assess performance capability without downloading the full model.
Describe scenario use case
For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.
I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.
Originally posted at huggingface/transformers.js#545 (comment)