n-gram | 易学教程

How to get the probability of bigrams in a text of sentences?

阅读更多关于 How to get the probability of bigrams in a text of sentences?

问题 I have a text which has many sentences. How can I use nltk.ngrams to process it? This is my code: sequence = nltk.tokenize.word_tokenize(raw) bigram = ngrams(sequence,2) freq_dist = nltk.FreqDist(bigram) prob_dist = nltk.MLEProbDist(freq_dist) number_of_bigrams = freq_dist.N() However, the above code supposes that all sentences are one sequence. But, sentences are separated, and I guess the last word of one sentence is unrelated to the start word of another sentence. How can I create a bigram

Generate bigrams with NLTK

阅读更多关于 Generate bigrams with NLTK

问题 I am trying to produce a bigram list of a given sentence for example, if I type, To be or not to be I want the program to generate to be, be or, or not, not to, to be I tried the following code but just gives me <generator object bigrams at 0x0000000009231360> This is my code: import nltk bigrm = nltk.bigrams(text) print(bigrm) So how do I get what I want? I want a list of combinations of the words like above (to be, be or, or not, not to, to be). 回答1: nltk.bigrams() returns an iterator (a

error TypeError: 'str' object is not callable python

阅读更多关于 error TypeError: 'str' object is not callable python

问题 I have this error in my code and I don't understand how to fixed import nltk from nltk.util import ngrams def word_grams(words, min=1, max=4): s = [] for n in range(min, max): for ngram in ngrams(words, n): s.append(' '.join(str(i) for i in ngram)) return s print word_grams('one two three four'.split(' ')) erorr in s.append(' '.join(str(i) for i in ngram)) TypeError: 'str' object is not callable 回答1: The code you posted is correct and work with both python 2.7 and 3.6 (for 3.6 you have to put

Creating n-grams word cloud using python

阅读更多关于 Creating n-grams word cloud using python

问题 I am trying to generate word cloud using bi-grams. I am able to generate the top 30 discriminative words but unable to display words together while plotting. My word cloud image still looks like a uni-gram cloud. I have used the following script and sci-kit learn packages. def create_wordcloud(pipeline): """ Create word cloud with top 30 discriminative words for each category """ class_labels = numpy.array(['Arts','Music','News','Politics','Science','Sports','Technology']) feature_names

How to search a corpus to find frequency of a string?

阅读更多关于 How to search a corpus to find frequency of a string?

问题 I'm working on an NLP project and I'd like to search through a corpus of text to try to find the frequency of a given verb-object pair. The aim would be to find which verb-object pair is most likely when given a few different possibilities. For example, if given the strings "Swing the stick" and "Eat the stick" I would hope that the corpus would show it's much more likely for someone to swing a stick than eat one. I've been reading about n-grams and corpus linguistics but I'm struggling to

Optimization of an R loop taking 18 hours to run

阅读更多关于 Optimization of an R loop taking 18 hours to run

问题 I've got an R code that works and does what I want but It takes a huge time to run. Here is an explanation of what the code does and the code itself. I've got a vector of 200000 line containing street adresses (String) : data. Example : > data[150000,] address "15 rue andre lalande residence marguerite yourcenar 91000 evry france" And I have a matrix of 131x2 string elements which are 5grams (part of word) and the ids of the bags of NGrams (example of a 5Grams bag : ["stack", "tacko", "ackov"

How to interpret Python NLTK bigram likelihood ratios?

阅读更多关于 How to interpret Python NLTK bigram likelihood ratios?

问题 I'm trying to figure out how to properly interpret nltk 's "likelihood ratio" given the below code (taken from this question). import nltk.collocations import nltk.corpus import collections bgm = nltk.collocations.BigramAssocMeasures() finder = nltk.collocations.BigramCollocationFinder.from_words(nltk.corpus.brown.words()) scored = finder.score_ngrams(bgm.likelihood_ratio) # Group bigrams by first word in bigram. prefix_keys = collections.defaultdict(list) for key, scores in scored: prefix

How to extract the verbs and all corresponding adverbs from a text?

阅读更多关于 How to extract the verbs and all corresponding adverbs from a text?

问题 Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done: Input text:""He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."" Code:- `finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',)) scored = finder2.score_ngrams(bigram_measures.raw

extracting n-grams from tweets in python

阅读更多关于 extracting n-grams from tweets in python

问题 Say that I have 100 tweets. In those tweets, I need to extract: 1) food names, and 2) beverage names. Example of tweet: "Yesterday I had a coca cola, and a hot dog for lunch, and some bana split for desert. I liked the coke, but the banana in the banana split dessert was ripe" I have to my disposal two lexicons. One with food names, and one with beverage names. Example in food names lexicon: "hot dog" "banana" "banana split" Example in beverage names lexicon: "coke" "cola" "coca cola" What I

n-grams with Naive Bayes classifier Error

阅读更多关于 n-grams with Naive Bayes classifier Error

问题 I was experimenting with python NLTK text classification. Here is the code example i am practicing: http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/ Here is code: from nltk import bigrams from nltk.probability import ELEProbDist, FreqDist from nltk import NaiveBayesClassifier from collections import defaultdict train_samples = {} with file ('data/positive.txt', 'rt') as f: for line in f.readlines(): train_samples[line] = 'pos' with file ('data/negative.txt',