Optimizing models using ONNX

arakesh · October 21, 2020, 6:27am

I came to know Hugging Face use optimized onnx models for inference on cpu. I tried to do something like that using keras VGGNet16 pretrained model using keras-onnx package (see this github issue) but couldn’t see any performance benefits. Can I know how exactly Hugging Face is optimizing models under the hood?

valhalla · October 21, 2020, 7:42am

These two blog posts might help

Topic		Replies	Views
Supporting ONNX optimized models 🤗Transformers	16	3231	September 1, 2021
Optimize large scale transformer model inference with ONNX Runtime Models	0	415	January 18, 2022
ONNX only faster at lower sequence lengths 🤗Optimum	2	398	May 21, 2024
Transformers.onnx vs optimum.onnxruntime 🤗Optimum	1	1246	September 12, 2022
Optimum library optimization and quantization fails 🤗Optimum	8	1770	February 22, 2025

Optimizing models using ONNX

Related topics