stanford corenlp sentiment training set

陌路散爱 提交于 2019-12-13 08:33:17

问题


I am new to the area of NLP and sentiment analysis in particular. My goal is to train the Stanford CoreNLP sentiment model. I am aware that the sentences provided as training data should be in the following format.

(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2 (2 even) (3 greater)))) (2 (2 than) (2 (2 (2 (2 (1 (2 Arnold) (2 Schwarzenegger)) (2 ,)) (2 (2 Jean-Claud) (2 (2 Van) (2 Damme)))) (2 or)) (2 (2 Steven) (2 Segal))))))))))))) (2 .)))

I am also aware that I can create the sentiment training model with my own training data using the following command.

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath     dev.txt -train -model model.ser.gz

My question is, do I have access to the training data set that was used to train the model? If yes, then where can I find it? Also, is there a way I can append new sentences to the original training data set and create the train model?


回答1:


The data is available here: http://nlp.stanford.edu/sentiment/

If you just create a new data set with the same format you can put the files in a directory and set -trainPath to that directory. It will load all files from that directory and train on them.

sample command:

java -Xmx8g edu.stanford.nlp.sentiment.SentimentTraining -train -numHid 25 -trainPath trees/training-data/ -model model.ser.gz


来源:https://stackoverflow.com/questions/42550092/stanford-corenlp-sentiment-training-set

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!