问题
I am trying to produce a bigram list of a given sentence for example, if I type,
To be or not to be
I want the program to generate
to be, be or, or not, not to, to be
I tried the following code but just gives me
<generator object bigrams at 0x0000000009231360>
This is my code:
import nltk
bigrm = nltk.bigrams(text)
print(bigrm)
So how do I get what I want? I want a list of combinations of the words like above (to be, be or, or not, not to, to be).
回答1:
nltk.bigrams() returns an iterator (a generator specifically) of bigrams. If you want a list, pass the iterator to list()
. It also expects a sequence of items to generate bigrams from, so you have to split the text before passing it (if you had not done it):
bigrm = list(nltk.bigrams(text.split()))
To print them out separated with commas, you could (in python 3):
print(*map(' '.join, bigrm), sep=', ')
If on python 2, then for example:
print ', '.join(' '.join((a, b)) for a, b in bigrm)
Note that just for printing you do not need to generate a list, just use the iterator.
回答2:
The following code produce a bigram
list for a given sentence
>>> import nltk
>>> from nltk.tokenize import word_tokenize
>>> text = "to be or not to be"
>>> tokens = nltk.word_tokenize(text)
>>> bigrm = nltk.bigrams(tokens)
>>> print(*map(' '.join, bigrm), sep=', ')
to be, be or, or not, not to, to be
来源:https://stackoverflow.com/questions/37651057/generate-bigrams-with-nltk