How to reduce the inference time of Helsinki-NLP/opus-mt-es-en (translation model) from transformer

后端未结

关注

 0  746

Currently Helsinki-NLP/opus-mt-es-en model takes around 1.5sec for inference from transformer. How can that be reduced? Also when trying to convert it to onxx runtime gettin