All I want to do is find the sentiment (positive/negative/neutral) of any given string. On researching I came across Stanford NLP. But sadly its in Java. Any ideas on how can I make it work for python?
问题:
回答1:
Use py-corenlp
Install Stanford CoreNLP
The latest version at this time (2018-02-21) is 3.9.0:
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-01-31.zip unzip stanford-corenlp-full-2018-01-31.zip
Start the server
cd stanford-corenlp-full-2018-01-31 java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000
Notes:
timeout
is in milliseconds, I set it to 10 sec above. You should increase it if you pass huge blobs to the server.- There are more options, you can list them with
--help
.
Install the python package
pip install pycorenlp
(See also the official list).
Use it
from pycorenlp import StanfordCoreNLP nlp = StanfordCoreNLP('http://localhost:9000') res = nlp.annotate("I love you. I hate him. You are nice. He is dumb", properties={ 'annotators': 'sentiment', 'outputFormat': 'json', 'timeout': 1000, }) for s in res["sentences"]: print("%d: '%s': %s %s" % ( s["index"], " ".join([t["word"] for t in s["tokens"]]), s["sentimentValue"], s["sentiment"]))
and you will get:
0: 'I love you .': 3 Positive 1: 'I hate him .': 1 Negative 2: 'You are nice .': 3 Positive 3: 'He is dumb': 1 Negative
Notes
- You pass the whole text to the server and it splits it into sentences.
- The sentiment is ascribed to each sentence, not the whole text.
- The average sentiment of tweets is between
Neutral
(2) andNegative
(1), the range is fromVeryNegative
(0) toVeryPositive
(4) which appear to be quite rare. - You can stop the server either by typing Ctrl-C at the terminal you started it from or using the shell command
kill $(lsof -ti tcp:9000)
.9000
is the default port, you can change it using the-port
option when starting the server. - Increase
timeout
(in milliseconds) in server or client if you get timeout errors. sentiment
is just one annotator, there are many more, and you can request several, separating them by comma:'annotators': 'sentiment,lemma'
.- Beware that the sentiment model is somewhat idiosyncratic (e.g., the result is different depending on whether you mention David or Bill).
PS. I cannot believe that I added a 9th answer, but, I guess, I had to, since none of the existing answers helped me (some of the 8 previous answers have now been deleted, some others have been converted to comments).
回答2:
Textblob
is a great package for sentimental analysis written in Python
. You can have the docs here . Sentimental analysis of any given sentence is carried out by inspecting words and their corresponding emotional score (sentiment). You can start with
$ pip install -U textblob $ python -m textblob.download_corpora
First pip install command will give you latest version of textblob installed in your (virtualenv
) system since you pass -U will upgrade the pip package its latest available version
. And the next will download all the data required, thecorpus
.
回答3:
I also faced similar situation. Most of my projects are in Python and sentiment part is Java. Luckily it's quite easy to lean how to use the stanford CoreNLP jar.
Here is one of my scripts and you can download jars and run it.
import java.util.List; import java.util.Properties; import edu.stanford.nlp.ling.CoreAnnotations; import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations; import edu.stanford.nlp.pipeline.Annotation; import edu.stanford.nlp.pipeline.StanfordCoreNLP; import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree; import edu.stanford.nlp.trees.Tree; import edu.stanford.nlp.util.ArrayCoreMap; import edu.stanford.nlp.util.CoreMap; public class Simple_NLP { static StanfordCoreNLP pipeline; public static void init() { Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, parse, sentiment"); pipeline = new StanfordCoreNLP(props); } public static String findSentiment(String tweet) { String SentiReturn = ""; String[] SentiClass ={"very negative", "negative", "neutral", "positive", "very positive"}; //Sentiment is an integer, ranging from 0 to 4. //0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive. int sentiment = 2; if (tweet != null && tweet.length() > 0) { Annotation annotation = pipeline.process(tweet); List sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class); if (sentences != null && sentences.size() > 0) { ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0); Tree tree = sentence.get(SentimentAnnotatedTree.class); sentiment = RNNCoreAnnotations.getPredictedClass(tree); SentiReturn = SentiClass[sentiment]; } } return SentiReturn; } }
回答4:
I am facing the same problem : maybe a solution with stanford_corenlp_py that uses Py4j
as pointed out by @roopalgarg.
stanford_corenlp_py
This repo provides a Python interface for calling the "sentiment" and "entitymentions" annotators of Stanford's CoreNLP Java package, current as of v. 3.5.1. It uses py4j to interact with the JVM; as such, in order to run a script like scripts/runGateway.py, you must first compile and run the Java classes creating the JVM gateway.