What is the difference between lemmatization vs stemming?

后端 未结 9 1976
无人共我
无人共我 2020-12-07 08:25

When do I use each ?

Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn\'t it be more accurate if it was?

相关标签:
9条回答
  • 2020-12-07 08:55

    An example-driven explanation on the differenes between lemmatization and stemming:

    Lemmatization handles matching “car” to “cars” along with matching “car” to “automobile”.

    Stemming handles matching “car” to “cars” .

    Lemmatization implies a broader scope of fuzzy word matching that is still handled by the same subsystems. It implies certain techniques for low level processing within the engine, and may also reflect an engineering preference for terminology.

    [...] Taking FAST as an example, their lemmatization engine handles not only basic word variations like singular vs. plural, but also thesaurus operators like having “hot” match “warm”.

    This is not to say that other engines don’t handle synonyms, of course they do, but the low level implementation may be in a different subsystem than those that handle base stemming.

    http://www.ideaeng.com/stemming-lemmatization-0601

    0 讨论(0)
  • 2020-12-07 08:58

    Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

    For instance:

    1. The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a dictionary look-up.

    2. The word "walk" is the base form for word "walking", and hence this is matched in both stemming and lemmatisation.

    3. The word "meeting" can be either the base form of a noun or a form of a verb ("to meet") depending on the context, e.g., "in our last meeting" or "We are meeting again tomorrow". Unlike stemming, lemmatisation can in principle select the appropriate lemma depending on the context.

    Source: https://en.wikipedia.org/wiki/Lemmatisation

    0 讨论(0)
  • 2020-12-07 08:59

    Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

    The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

    However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

    From the NLTK docs:

    Lemmatization and stemming are special cases of normalization. They identify a canonical representative for a set of related word forms.

    0 讨论(0)
提交回复
热议问题