Looking for a database or text file of english words with their different forms

前端 未结 1 1164
故里飘歌
故里飘歌 2021-01-18 08:02

I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don\'t use a dictionary are not accurate. Also I t

1条回答
  •  南笙
    南笙 (楼主)
    2021-01-18 08:37

    You could download LanguageTool (Disclaimer: I'm the maintainer), which comes with a binary file english.dict. The LanguageTool Wiki describes how to dump that file as a text file:

    java -jar morfologik-tools-1.6.0-standalone.jar fsa_dump -x -d english.dict
    

    For run, the file will contain this:

    ran run VBD
    run run NN
    run run VB
    run run VBN
    run run VBP
    running run VBG
    runs run NNS
    runs run VBZ
    

    The first column is the inflected form, the second is the base form, and the third is the part-of-speech tag according to the (slightly extended) Penn Treebank tagset.

    0 讨论(0)
提交回复
热议问题