I came to know Hugging Face use optimized onnx models for inference on cpu. I tried to do something like that using keras VGGNet16 pretrained model using keras-onnx package (see this github issue) but couldn’t see any performance benefits. Can I know how exactly Hugging Face is optimizing models under the hood?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Supporting ONNX optimized models | 16 | 3231 | September 1, 2021 | |
| Optimize large scale transformer model inference with ONNX Runtime | 0 | 415 | January 18, 2022 | |
| ONNX only faster at lower sequence lengths | 2 | 398 | May 21, 2024 | |
| Transformers.onnx vs optimum.onnxruntime | 1 | 1246 | September 12, 2022 | |
| Optimum library optimization and quantization fails | 8 | 1770 | February 22, 2025 |