nltk | 易学教程

NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

阅读更多关于 NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

问题 I have 2 sentences in my dataset: w1 = I am Pusheen the cat.I am so cute. # no space after period w2 = I am Pusheen the cat. I am so cute. # with space after period When I use NKTL tokenizer (both word and sent), nltk cannot distinct the between cat.I. Here is word tokenize >>> nltk.word_tokenize(w1, 'english') ['I', 'am', 'Pusheen', 'the', 'cat.I', 'am', 'so', 'cute'] >>> nltk.word_tokenize(w2, 'english') ['I', 'am', 'Pusheen', 'the', 'cat', '.', 'I', 'am', 'so', 'cute'] and sent tokenize >>

NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

阅读更多关于 NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

阅读更多关于 NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

Python: NLTK ValueError: A Lidstone probability distribution must have at least one bin?

阅读更多关于 Python: NLTK ValueError: A Lidstone probability distribution must have at least one bin?

问题 For a task I am to use ConditionalProbDist using LidstoneProbDist as the estimator, adding +0.01 to the sample count for each bin. I thought the following line of code would achieve this, but it produces a value error fd = nltk.ConditionalProbDist(fd,nltk.probability.LidstoneProbDist,0.01) I'm not sure how to format the arguments within ConditionalProbDist and haven't had much luck in finding out how to do so via python's help feature or google, so if anyone could set me right, it would be

Python: NLTK ValueError: A Lidstone probability distribution must have at least one bin?

阅读更多关于 Python: NLTK ValueError: A Lidstone probability distribution must have at least one bin?

Find similar texts based on paraphrase detection [closed]

阅读更多关于 Find similar texts based on paraphrase detection [closed]

问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . Improve this question I am interested in finding similar content(text) based on paraphrasing. How do I do this? Are there any specific tools which can do this? In python preferably. 回答1: I believe the tool you are looking for is Latent Semantic Analysis. Given that my post is going to

Find similar texts based on paraphrase detection [closed]

阅读更多关于 Find similar texts based on paraphrase detection [closed]

Python NetworkX error: module 'networkx.drawing' has no attribute 'graphviz_layout'

阅读更多关于 Python NetworkX error: module 'networkx.drawing' has no attribute 'graphviz_layout'

问题 I am teaching myself Python and NLTK for work using the book "Natural Language Processing with Python" ("www.nltk.org/book"). I am stuck on Chapter 4 Section 4 part 8 on NetworkX. When I try to run example 4.15, it should draw a graph, but instead I get the following error message: AttributeError: module 'networkx.drawing' has no attribute 'graphviz_layout' The culprit code line appears to be >>> nx.draw_graphviz(graph, node_size = [16 * graph.degree(n) for n in graph], node_color = [graph

Modify NLTK word_tokenize to prevent tokenization of parenthesis

阅读更多关于 Modify NLTK word_tokenize to prevent tokenization of parenthesis

问题 I have the following main.py . #!/usr/bin/env python # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8: import nltk import string import sys for token in nltk.word_tokenize(''.join(sys.stdin.readlines())): #print token if len(token) == 1 and not token in string.punctuation or len(token) > 1: print token The output is the following. ./main.py <<< 'EGR1(-/-) mouse embryonic fibroblasts' EGR1 -/- mouse embryonic fibroblasts I want to slightly change the tokenizer so

Django webapp (on an Apache2 server) hangs indefintely when importing nltk in views.py

阅读更多关于 Django webapp (on an Apache2 server) hangs indefintely when importing nltk in views.py

问题 To elaborate a little more from the title, I'm having issues importing nltk to use in a django web app. I've deployed the web app on an apache2 server. When I import nltk in views.py, the web page refuses to load and eventually times out after a few minutes of loading. I've installed nltk using pip. I've used pip to install a number of other python packages which I've been able to reference without issue within django. I haven't been able to find anything solid to explain why this would be