问题
in below code . i didn't understand the meaning of workers parameter . model = Word2Vec(sentences, size=300000, window=2, min_count=5, workers=4)
回答1:
workers = use this many worker threads to train the model (=faster training with multicore machines).
If your system is having 2 cores, and if you specify workers=2, then data will be trained in two parallel ways.
By default , worker = 1 i.e, no parallelization
回答2:
As others have mentioned, workers
controls the number of independent threads doing simultaneous training.
In general, you'll never want to use more workers than the number of CPU cores.
But further, the gensim Word2Vec
implementation faces a bit more thread-to-thread bottlenecking due to issues like the Python "Global Interpreter Lock" ('GIL') and some of its IO/corpus-handling design decisions.
So on systems with a large number of cores, such as more than 16, the optimal workers
value for maximum throughput is usually less than the full count of cores – often in the 3-12 range. (The exact number will depend on other aspects of your corpus-handling and chosen metaparameters, and for now is most often discovered through trial-and-error.)
If your corpus is already in a specific text format, the latest gensim release, 3.6.0, offers a new input mode that allows better scaling of workers
all the way up to the count of CPU cores. See this section of the release notes about the new corpus_file parameter for details.
回答3:
You can use effective_n_jobs to determine the correct use of the number of threads in your case.
from gensim.utils import effective_n_jobs
effective_n_jobs(1)
effective_n_jobs(-1)
effective_n_jobs(None)
effective_n_jobs(12)
effective_n_jobs(10)
# outputs
1
12
1
12
10
来源:https://stackoverflow.com/questions/53417258/what-is-workers-parameter-in-word2vec-in-nlp