I\'ve been working with Stanford\'s coreNLP to perform sentiment analysis on some data I have and I\'m working on creating a training model. I know we can create a training
dev.txt should be the same as train.txt just with a different set of sentences. Note that the same sentence should not appear in dev.txt and train.txt. The development set is used to evaluate the quality of the model you train on the training data.
We don't distribute a tool for tagging sentiment data. This class could be helpful in building data: http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/sentiment/BuildBinarizedDataset.html
Here are the sizes of the train, dev, and test sets used for the sentiment model: train=8544, dev=1101, test=2210
Here is some sample code for evaluating a model
// load a model
SentimentModel model = SentimentModel.loadSerialized(modelPath);
// load devTrees
List<Tree> devTrees;
devTrees = SentimentUtils.readTreesWithGoldLabels(devPath);
// evaluate on devTrees
Evaluate eval = new Evaluate(model);
eval.eval(devTrees);
eval.printSummary();
You can find what you need to import, etc... by looking at:
edu/stanford/nlp/sentiment/SentimentTraining.java