Computing N Grams using Python

后端未结

关注

 8  1715

情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答

有刺的猬 (楼主)

2020-11-28 06:19

Though the post is old, I thought to mention my answer here so that most of the ngrams creation logic can be in one post.

There is something by name TextBlob in Python. It creates ngrams very easily similar to NLTK.

Below is the code snippet with its output for easy understanding.

sent = """This is to show the usage of Text Blob in Python"""
blob = TextBlob(sent)
unigrams = blob.ngrams(n=1)
bigrams = blob.ngrams(n=2)
trigrams = blob.ngrams(n=3)

And the output is :

unigrams
[WordList(['This']),
 WordList(['is']),
 WordList(['to']),
 WordList(['show']),
 WordList(['the']),
 WordList(['usage']),
 WordList(['of']),
 WordList(['Text']),
 WordList(['Blob']),
 WordList(['in']),
 WordList(['Python'])]

bigrams
[WordList(['This', 'is']),
 WordList(['is', 'to']),
 WordList(['to', 'show']),
 WordList(['show', 'the']),
 WordList(['the', 'usage']),
 WordList(['usage', 'of']),
 WordList(['of', 'Text']),
 WordList(['Text', 'Blob']),
 WordList(['Blob', 'in']),
 WordList(['in', 'Python'])]

trigrams
[WordList(['This', 'is', 'to']),
 WordList(['is', 'to', 'show']),
 WordList(['to', 'show', 'the']),
 WordList(['show', 'the', 'usage']),
 WordList(['the', 'usage', 'of']),
 WordList(['usage', 'of', 'Text']),
 WordList(['of', 'Text', 'Blob']),
 WordList(['Text', 'Blob', 'in']),
 WordList(['Blob', 'in', 'Python'])]

As simple as that.

There is more to this that are being done by TextBlob. Please have a look at this doc for more details - https://textblob.readthedocs.io/en/dev/

0 讨论(0)

查看其它8个回答