I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don\'t use a dictionary are not accurate. Also I t
You could download LanguageTool (Disclaimer: I'm the maintainer), which comes with a binary file english.dict
. The LanguageTool Wiki describes how to dump that file as a text file:
java -jar morfologik-tools-1.6.0-standalone.jar fsa_dump -x -d english.dict
For run
, the file will contain this:
ran run VBD
run run NN
run run VB
run run VBN
run run VBP
running run VBG
runs run NNS
runs run VBZ
The first column is the inflected form, the second is the base form, and the third is the part-of-speech tag according to the (slightly extended) Penn Treebank tagset.