Generate bigrams with NLTK

后端 未结 2 2021
-上瘾入骨i
-上瘾入骨i 2021-02-07 07:33

I am trying to produce a bigram list of a given sentence for example, if I type,

    To be or not to be

I want the program to generate

相关标签:
2条回答
  • 2021-02-07 07:47

    nltk.bigrams() returns an iterator (a generator specifically) of bigrams. If you want a list, pass the iterator to list(). It also expects a sequence of items to generate bigrams from, so you have to split the text before passing it (if you had not done it):

    bigrm = list(nltk.bigrams(text.split()))
    

    To print them out separated with commas, you could (in python 3):

    print(*map(' '.join, bigrm), sep=', ')
    

    If on python 2, then for example:

    print ', '.join(' '.join((a, b)) for a, b in bigrm)
    

    Note that just for printing you do not need to generate a list, just use the iterator.

    0 讨论(0)
  • 2021-02-07 07:52

    The following code produce a bigram list for a given sentence

    >>> import nltk
    >>> from nltk.tokenize import word_tokenize
    >>> text = "to be or not to be"
    >>> tokens = nltk.word_tokenize(text)
    >>> bigrm = nltk.bigrams(tokens)
    >>> print(*map(' '.join, bigrm), sep=', ')
    to be, be or, or not, not to, to be
    
    0 讨论(0)
提交回复
热议问题