nltk | 易学教程

Django webapp (on an Apache2 server) hangs indefintely when importing nltk in views.py

阅读更多关于 Django webapp (on an Apache2 server) hangs indefintely when importing nltk in views.py

问题 To elaborate a little more from the title, I'm having issues importing nltk to use in a django web app. I've deployed the web app on an apache2 server. When I import nltk in views.py, the web page refuses to load and eventually times out after a few minutes of loading. I've installed nltk using pip. I've used pip to install a number of other python packages which I've been able to reference without issue within django. I haven't been able to find anything solid to explain why this would be

Regular expressions in POS tagged NLTK corpus

阅读更多关于 Regular expressions in POS tagged NLTK corpus

问题 I'm loading a POS-tagged corpus in NLTK, and I would like to find certain patterns involving POS tags. These patterns can be quite complex, including a lot of different combinations of POS tags. Example input string: We/PRP spent/VBD some/DT time/NN reading/NN about/IN the/DT historical/JJ importance/NN of/IN tea/NN in/IN Korea/NNP and/CC China/NNP and/CC then/RB tasted/VBD the/DT most/JJS expensive/JJ green/JJ tea/NN I/PRP have/VBP ever/RB seen/VBN ./. In this case the POS pattern is

NLTK Sentence Tokenizer, custom sentence starters

阅读更多关于 NLTK Sentence Tokenizer, custom sentence starters

问题 I'm trying to split a text into sentences with the PunktSentenceTokenizer from nltk. The text contains listings starting with bullet points, but they are not recognized as new sentences. I tried to add some parameters but that didn't work. Is there another way? Here is some example code: from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters params = PunktParameters() params.sent_starters = set(['•']) tokenizer = PunktSentenceTokenizer(params) tokenizer.tokenize('• I am a

tf-idf on a somewhat large (65k) amount of text files

阅读更多关于 tf-idf on a somewhat large (65k) amount of text files

问题 I want to try tfidf with scikit-learn (or nltk or am open to other suggestions). The data I have is a relatively large amount of discussion forum posts (~65k) we have scraped and stored in a mongoDB. Each post has a Post title, Date and Time of post, Text of the post message (or a re: if a reply to an existing post), User name, message ID and whether it is a child or parent post (in a thread, where you have the original post, and then replies to this op, or nested replies, the tree). I figure

Chunking sentences using the word 'but' with RegEx

阅读更多关于 Chunking sentences using the word 'but' with RegEx

问题 I am attempting to chunk sentences using RegEx at the word 'but' (or any other coordinating conjunction words). It's not working... sentence = nltk.pos_tag(word_tokenize("There are no large collections present but there is spinal canal stenosis.")) result = nltk.RegexpParser(grammar).parse(sentence) DigDug = nltk.RegexpParser(r'CHUNK: {.*<CC>.*}') for subtree in DigDug.parse(sentence).subtrees(): if subtree.label() == 'CHUNK': print(subtree.node()) I need to split the sentence "There are no

Chunking sentences using the word 'but' with RegEx

阅读更多关于 Chunking sentences using the word 'but' with RegEx

Chunking sentences using the word 'but' with RegEx

阅读更多关于 Chunking sentences using the word 'but' with RegEx

Python NLTK WUP Similarity Score not unity for exact same word

阅读更多关于 Python NLTK WUP Similarity Score not unity for exact same word

问题 Simple code like follows gives out similarity score of 0.75 for both cases. As you can see both the words are the exact same. To avoid any confusion I also compared a word with itself. The score refuses to bulge from 0.75. What is going on here? from nltk.corpus import wordnet as wn actual=wn.synsets('orange')[0] predicted=wn.synsets('orange')[0] similarity=actual.wup_similarity(predicted) print similarity similarity=actual.wup_similarity(actual) print similarity 回答1: This is an interesting

Python NLTK WUP Similarity Score not unity for exact same word

阅读更多关于 Python NLTK WUP Similarity Score not unity for exact same word

Extracting sentences using pandas with specific words

阅读更多关于 Extracting sentences using pandas with specific words

问题 I have a excel file with a text column. All I need to do is to extract the sentences from the text column for each row with specific words. I have tried using defining a function. import pandas as pd from nltk.tokenize import sent_tokenize from nltk.tokenize import word_tokenize #################Reading in excel file##################### str_df = pd.read_excel("C:\\Users\\HP\Desktop\\context.xlsx") ################# Defining a function ##################### def sentence_finder(text,word):