what is workers parameter in word2vec in NLP

天大地大妈咪最大 提交于 2021-01-29 14:57:52

问题


in below code . i didn't understand the meaning of workers parameter . model = Word2Vec(sentences, size=300000, window=2, min_count=5, workers=4)


回答1:


workers = use this many worker threads to train the model (=faster training with multicore machines).

If your system is having 2 cores, and if you specify workers=2, then data will be trained in two parallel ways.

By default , worker = 1 i.e, no parallelization




回答2:


As others have mentioned, workers controls the number of independent threads doing simultaneous training.

In general, you'll never want to use more workers than the number of CPU cores.

But further, the gensim Word2Vec implementation faces a bit more thread-to-thread bottlenecking due to issues like the Python "Global Interpreter Lock" ('GIL') and some of its IO/corpus-handling design decisions.

So on systems with a large number of cores, such as more than 16, the optimal workers value for maximum throughput is usually less than the full count of cores – often in the 3-12 range. (The exact number will depend on other aspects of your corpus-handling and chosen metaparameters, and for now is most often discovered through trial-and-error.)

If your corpus is already in a specific text format, the latest gensim release, 3.6.0, offers a new input mode that allows better scaling of workers all the way up to the count of CPU cores. See this section of the release notes about the new corpus_file parameter for details.




回答3:


You can use effective_n_jobs to determine the correct use of the number of threads in your case.

from gensim.utils import effective_n_jobs

effective_n_jobs(1)
effective_n_jobs(-1)
effective_n_jobs(None)
effective_n_jobs(12)
effective_n_jobs(10)

# outputs
1
12
1
12
10


来源:https://stackoverflow.com/questions/53417258/what-is-workers-parameter-in-word2vec-in-nlp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!