Multi-Threaded NLP with Spacy pipe
I'm trying to apply Spacy NLP (Natural Language Processing) pipline to a big text file like Wikipedia Dump. Here is my code based on Spacy's documentation example: from spacy.en import English input = open("big_file.txt") big_text= input.read() input.close() nlp= English() out = nlp.pipe([unicode(big_text, errors='ignore')], n_threads=-1) doc = out.next() Spacy applies all nlp operations like POS tagging, Lemmatizing and etc all at once. It is like a pipeline for NLP that takes care of everything you need in one step. Applying pipe method tho is supposed to make the process a lot faster by