Separate word lists for nouns, verbs, adjectives, etc

后端 未结 5 1874
眼角桃花
眼角桃花 2021-01-30 02:54

Usually word lists are 1 file that contains everything, but are there separately downloadable noun list, verb list, adjective list, etc?

I need them for English specific

5条回答
  •  太阳男子
    2021-01-30 03:01

    If you download just the database files from wordnet.princeton.edu/download/current-version you can extract the words by running these commands:

    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z_]*\s" data.adj | cut -d ' ' -f 5 > conv.data.adj
    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z_]*\s" data.adv | cut -d ' ' -f 5 > conv.data.adv
    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z_]*\s" data.noun | cut -d ' ' -f 5 > conv.data.noun
    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z_]*\s" data.verb | cut -d ' ' -f 5 > conv.data.verb
    

    Or if you only want single words (no underscores)

    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z]*\s" data.adj | cut -d ' ' -f 5 > conv.data.adj
    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z]*\s" data.adv | cut -d ' ' -f 5 > conv.data.adv
    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z]*\s" data.noun | cut -d ' ' -f 5 > conv.data.noun
    egrep -o "^[0-9]{8}\s[0-9]{2}\s[a-z]\s[0-9]{2}\s[a-zA-Z]*\s" data.verb | cut -d ' ' -f 5 > conv.data.verb
    

提交回复
热议问题