Python NLTK: Bigrams trigrams fourgrams

前端 未结 4 1136
被撕碎了的回忆
被撕碎了的回忆 2020-12-25 14:16

I have this example and i want to know how to get this result. I have text and I tokenize it then I collect the bigram and trigram and fourgram like that

im         


        
4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-25 15:00

    Try everygrams:

    from nltk import everygrams
    list(everygrams('hello', 1, 5))
    

    [out]:

    [('h',),
     ('e',),
     ('l',),
     ('l',),
     ('o',),
     ('h', 'e'),
     ('e', 'l'),
     ('l', 'l'),
     ('l', 'o'),
     ('h', 'e', 'l'),
     ('e', 'l', 'l'),
     ('l', 'l', 'o'),
     ('h', 'e', 'l', 'l'),
     ('e', 'l', 'l', 'o'),
     ('h', 'e', 'l', 'l', 'o')]
    

    Word tokens:

    from nltk import everygrams
    
    list(everygrams('hello word is a fun program'.split(), 1, 5))
    

    [out]:

    [('hello',),
     ('word',),
     ('is',),
     ('a',),
     ('fun',),
     ('program',),
     ('hello', 'word'),
     ('word', 'is'),
     ('is', 'a'),
     ('a', 'fun'),
     ('fun', 'program'),
     ('hello', 'word', 'is'),
     ('word', 'is', 'a'),
     ('is', 'a', 'fun'),
     ('a', 'fun', 'program'),
     ('hello', 'word', 'is', 'a'),
     ('word', 'is', 'a', 'fun'),
     ('is', 'a', 'fun', 'program'),
     ('hello', 'word', 'is', 'a', 'fun'),
     ('word', 'is', 'a', 'fun', 'program')]
    

提交回复
热议问题