How to turn plural words singular?

前端 未结 13 1744
旧时难觅i
旧时难觅i 2020-12-24 14:06

I\'m preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. He

相关标签:
13条回答
  • 2020-12-24 14:32

    See also this answer, which recommends using Morpha (or studying the algorithm behind it).

    If you know that the words that you want to lemmatize are plural nouns then you can tag them with NNS to get a more accurate output.

    Input example:

    $ cat test.txt 
    Types_NNS
    Pies_NNS
    Trees_NNS
    Buses_NNS
    Radii_NNS
    Communities_NNS
    Sheep_NNS
    Fish_NNS
    

    Output example:

    $ cat test.txt | ./morpha -c
    Type
    Pie
    Tree
    Bus
    Radius
    Community
    Sheep
    Fish
    
    0 讨论(0)
  • 2020-12-24 14:33

    Those are all general rules (and good ones) but English is not a language for the faint of heart :-).

    My own preference would be to have a transformation engine along with a set of transformations (surprisingly enough) for doing the actual work. You would run through the transformations (from specific to general) and, when a match was found, apply the transformation to the word and stop.

    Regular expressions would be an ideal approach to this due to their expressiveness. An example rule set:

     1. If the word is fish, return fish.
     2. If the word is sheep, return sheep.
     3. If the word is "radii", return "radius".
     4. If the word ends in "ii", replace that "ii" with "us" (octopii,virii).
     5. If a word ends with -ies, replace the ending with -y
     6. If a word ends with -es, remove it.
     7. Otherwise, just remove any trailing -s.
    

    Note the requirement to keep this transformation set up to date. For example, let's say someone adds the table name types. This would currently be captured by rule #6 and you would get the singular value typ, which is obviously wrong.

    The solution is to insert a new rule somewhere before #6, something like:

     3.5: If the word is "types", return "type".
    

    for a very specific transformation, or perhaps somewhere later if it can be made more general.

    In other words, you'll basically need to keep this transformation table updated as you find all those wondrous exceptions that English has spawned over the centuries.


    The other possibility is to not waste your time with general rules at all.

    Since the names of the tables will be relatively limited, just create another table (or some sort of data structure) called singulars which maps all the relevant plural table names (employees, customers) to singular object names (employee, customer).

    Then every time a table is added to your schema, ensure you add an entry to the singulars "table" so you can singularize it.

    0 讨论(0)
  • 2020-12-24 14:33

    Consider the python package "inflect"

    "Correctly generate plurals, singular nouns, ordinals, indefinite articles; convert numbers to words"

    https://pypi.python.org/pypi/inflect

    0 讨论(0)
  • 2020-12-24 14:37

    Maybe take a look at source code of something like Rails Inflector

    0 讨论(0)
  • 2020-12-24 14:40

    I'm going to try this MorphAdorner: http://morphadorner.northwestern.edu/morphadorner/download/ (Java). It's a collection of different types of NLP processing tools, and you can test them through online examples. For your problem (that is also my problem) there's the Pluralizer tool: http://morphadorner.northwestern.edu/morphadorner/pluralizer/example/

    0 讨论(0)
  • 2020-12-24 14:41

    The problem is that's based on the general rules, but English has (figuratively) a billion exceptions... What do you do with words like "fish", or "geese"?

    Also, the rules are for how to turn singular nouns to plurals. The reverse mapping isn't necessarily possible (consider "freebies").

    0 讨论(0)
提交回复
热议问题