When we train custom model, I do see we have dropout and n_iter parameters to tune, but which deep learning algorithm does Spacy Uses to train Custom Models? Also, when Adding n
spaCy has its own deep learning library called thinc used under the hood for different NLP models. for most (if not all) tasks, spaCy uses a deep neural network based on CNN with a few tweaks. Specifically for Named Entity Recognition, spacy uses:
A transition based approach borrowed from shift-reduce parsers, which is described in the paper Neural Architectures for Named Entity Recognition by Lample et al. Matthew Honnibal describes how spaCy uses this on a YouTube video.
A framework that's called "Embed. Encode. Attend. Predict" (Starting here on the video), slides here.
Embed: Words are embedded using a Bloom filter, which means that word hashes are kept as keys in the embedding dictionary, instead of the word itself. This maintains a more compact embeddings dictionary, with words potentially colliding and ending up with the same vector representations.
Encode: List of words is encoded into a sentence matrix, to take context into account. spaCy uses CNN for encoding.
Attend: Decide which parts are more informative given a query, and get problem specific representations.
Advantages of this framework, per Honnibal are:
For a full overview, Matthew Honnibal describes how the model in this YouTube video. Slides could be found here.
Note: This information is based on slides from 2017. The engine might have changed since then.
Theoretically, when fine-tuning a spaCy model with new entities, you have to make sure the model doesn't forget representations for previously learned entities. The best thing, if possible, is to train a model from scratch, but that might not be easy or possible due to lack of data or resources.