nltk

Django webapp (on an Apache2 server) hangs indefintely when importing nltk in views.py

左心房为你撑大大i 提交于 2021-02-08 06:47:09
问题 To elaborate a little more from the title, I'm having issues importing nltk to use in a django web app. I've deployed the web app on an apache2 server. When I import nltk in views.py, the web page refuses to load and eventually times out after a few minutes of loading. I've installed nltk using pip. I've used pip to install a number of other python packages which I've been able to reference without issue within django. I haven't been able to find anything solid to explain why this would be

Regular expressions in POS tagged NLTK corpus

荒凉一梦 提交于 2021-02-08 06:29:14
问题 I'm loading a POS-tagged corpus in NLTK, and I would like to find certain patterns involving POS tags. These patterns can be quite complex, including a lot of different combinations of POS tags. Example input string: We/PRP spent/VBD some/DT time/NN reading/NN about/IN the/DT historical/JJ importance/NN of/IN tea/NN in/IN Korea/NNP and/CC China/NNP and/CC then/RB tasted/VBD the/DT most/JJS expensive/JJ green/JJ tea/NN I/PRP have/VBP ever/RB seen/VBN ./. In this case the POS pattern is

NLTK Sentence Tokenizer, custom sentence starters

拈花ヽ惹草 提交于 2021-02-08 05:29:23
问题 I'm trying to split a text into sentences with the PunktSentenceTokenizer from nltk. The text contains listings starting with bullet points, but they are not recognized as new sentences. I tried to add some parameters but that didn't work. Is there another way? Here is some example code: from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters params = PunktParameters() params.sent_starters = set(['•']) tokenizer = PunktSentenceTokenizer(params) tokenizer.tokenize('• I am a

tf-idf on a somewhat large (65k) amount of text files

十年热恋 提交于 2021-02-08 04:45:37
问题 I want to try tfidf with scikit-learn (or nltk or am open to other suggestions). The data I have is a relatively large amount of discussion forum posts (~65k) we have scraped and stored in a mongoDB. Each post has a Post title, Date and Time of post, Text of the post message (or a re: if a reply to an existing post), User name, message ID and whether it is a child or parent post (in a thread, where you have the original post, and then replies to this op, or nested replies, the tree). I figure

Chunking sentences using the word 'but' with RegEx

十年热恋 提交于 2021-02-07 20:43:04
问题 I am attempting to chunk sentences using RegEx at the word 'but' (or any other coordinating conjunction words). It's not working... sentence = nltk.pos_tag(word_tokenize("There are no large collections present but there is spinal canal stenosis.")) result = nltk.RegexpParser(grammar).parse(sentence) DigDug = nltk.RegexpParser(r'CHUNK: {.*<CC>.*}') for subtree in DigDug.parse(sentence).subtrees(): if subtree.label() == 'CHUNK': print(subtree.node()) I need to split the sentence "There are no

Chunking sentences using the word 'but' with RegEx

谁说我不能喝 提交于 2021-02-07 20:42:31
问题 I am attempting to chunk sentences using RegEx at the word 'but' (or any other coordinating conjunction words). It's not working... sentence = nltk.pos_tag(word_tokenize("There are no large collections present but there is spinal canal stenosis.")) result = nltk.RegexpParser(grammar).parse(sentence) DigDug = nltk.RegexpParser(r'CHUNK: {.*<CC>.*}') for subtree in DigDug.parse(sentence).subtrees(): if subtree.label() == 'CHUNK': print(subtree.node()) I need to split the sentence "There are no

Chunking sentences using the word 'but' with RegEx

耗尽温柔 提交于 2021-02-07 20:41:45
问题 I am attempting to chunk sentences using RegEx at the word 'but' (or any other coordinating conjunction words). It's not working... sentence = nltk.pos_tag(word_tokenize("There are no large collections present but there is spinal canal stenosis.")) result = nltk.RegexpParser(grammar).parse(sentence) DigDug = nltk.RegexpParser(r'CHUNK: {.*<CC>.*}') for subtree in DigDug.parse(sentence).subtrees(): if subtree.label() == 'CHUNK': print(subtree.node()) I need to split the sentence "There are no

Python NLTK WUP Similarity Score not unity for exact same word

人走茶凉 提交于 2021-02-07 12:52:05
问题 Simple code like follows gives out similarity score of 0.75 for both cases. As you can see both the words are the exact same. To avoid any confusion I also compared a word with itself. The score refuses to bulge from 0.75. What is going on here? from nltk.corpus import wordnet as wn actual=wn.synsets('orange')[0] predicted=wn.synsets('orange')[0] similarity=actual.wup_similarity(predicted) print similarity similarity=actual.wup_similarity(actual) print similarity 回答1: This is an interesting

Python NLTK WUP Similarity Score not unity for exact same word

孤人 提交于 2021-02-07 12:51:19
问题 Simple code like follows gives out similarity score of 0.75 for both cases. As you can see both the words are the exact same. To avoid any confusion I also compared a word with itself. The score refuses to bulge from 0.75. What is going on here? from nltk.corpus import wordnet as wn actual=wn.synsets('orange')[0] predicted=wn.synsets('orange')[0] similarity=actual.wup_similarity(predicted) print similarity similarity=actual.wup_similarity(actual) print similarity 回答1: This is an interesting

Extracting sentences using pandas with specific words

别等时光非礼了梦想. 提交于 2021-02-07 10:09:27
问题 I have a excel file with a text column. All I need to do is to extract the sentences from the text column for each row with specific words. I have tried using defining a function. import pandas as pd from nltk.tokenize import sent_tokenize from nltk.tokenize import word_tokenize #################Reading in excel file##################### str_df = pd.read_excel("C:\\Users\\HP\Desktop\\context.xlsx") ################# Defining a function ##################### def sentence_finder(text,word):