What is a good way to speed up test runs utilizing larger spacy models?

问题

I have constructed some tests relying on the en_core_web_md model. The model takes ~15 sec to load into memory on my computer making the tests a pain to run.

Is there a smart way to speed it up?

回答1:

The v2.2.[0-5] md models have a minor bug that make them particularly slow to load (see https://github.com/explosion/spaCy/pull/4990).

You can reformat one file in the model package to improve the load time. In the vocab directory for the model package (e.g., lib/python3.7/site-packages/en_core_web_md/en_core_web_md-2.2.5/vocab):

import srsly
orig_data = srsly.read_msgpack("key2row")
new_data = {}
for key, value in orig_data.items():
    new_data[int(key)] = int(value)
srsly.write_msgpack("key2row", new_data)

In my tests, this nearly halves the loading time (18s to 10s). The remaining time is mostly loading strings and lexemes for the model, which is harder to optimize further at this point. So this improves things a bit but the overall load time is still relatively burdensome for short tests.

来源：https://stackoverflow.com/questions/60533029/what-is-a-good-way-to-speed-up-test-runs-utilizing-larger-spacy-models

标签

python

testing

spacy

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!