I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:
\"Cystic fibrosis affects 30,000 children and young adults in the US a
Though the post is old, I thought to mention my answer here so that most of the ngrams creation logic can be in one post.
There is something by name TextBlob in Python. It creates ngrams very easily similar to NLTK.
Below is the code snippet with its output for easy understanding.
sent = """This is to show the usage of Text Blob in Python"""
blob = TextBlob(sent)
unigrams = blob.ngrams(n=1)
bigrams = blob.ngrams(n=2)
trigrams = blob.ngrams(n=3)
And the output is :
unigrams
[WordList(['This']),
WordList(['is']),
WordList(['to']),
WordList(['show']),
WordList(['the']),
WordList(['usage']),
WordList(['of']),
WordList(['Text']),
WordList(['Blob']),
WordList(['in']),
WordList(['Python'])]
bigrams
[WordList(['This', 'is']),
WordList(['is', 'to']),
WordList(['to', 'show']),
WordList(['show', 'the']),
WordList(['the', 'usage']),
WordList(['usage', 'of']),
WordList(['of', 'Text']),
WordList(['Text', 'Blob']),
WordList(['Blob', 'in']),
WordList(['in', 'Python'])]
trigrams
[WordList(['This', 'is', 'to']),
WordList(['is', 'to', 'show']),
WordList(['to', 'show', 'the']),
WordList(['show', 'the', 'usage']),
WordList(['the', 'usage', 'of']),
WordList(['usage', 'of', 'Text']),
WordList(['of', 'Text', 'Blob']),
WordList(['Text', 'Blob', 'in']),
WordList(['Blob', 'in', 'Python'])]
As simple as that.
There is more to this that are being done by TextBlob. Please have a look at this doc for more details - https://textblob.readthedocs.io/en/dev/