问题
I have constructed some tests relying on the en_core_web_md
model. The model takes ~15 sec to load into memory on my computer making the tests a pain to run.
Is there a smart way to speed it up?
回答1:
The v2.2.[0-5] md
models have a minor bug that make them particularly slow to load (see https://github.com/explosion/spaCy/pull/4990).
You can reformat one file in the model package to improve the load time.
In the vocab
directory for the model package (e.g., lib/python3.7/site-packages/en_core_web_md/en_core_web_md-2.2.5/vocab
):
import srsly
orig_data = srsly.read_msgpack("key2row")
new_data = {}
for key, value in orig_data.items():
new_data[int(key)] = int(value)
srsly.write_msgpack("key2row", new_data)
In my tests, this nearly halves the loading time (18s to 10s). The remaining time is mostly loading strings and lexemes for the model, which is harder to optimize further at this point. So this improves things a bit but the overall load time is still relatively burdensome for short tests.
来源:https://stackoverflow.com/questions/60533029/what-is-a-good-way-to-speed-up-test-runs-utilizing-larger-spacy-models