Good dataset for sentiment analysis? [closed]

前端未结

关注

 3  1196

误落风尘

相关标签:

3条回答

[愿得一人]

2021-01-30 12:26

I started to gather sentiment analysis tools/datasets/lexicons in one place, it could be useful for you too: https://github.com/laugustyniak/awesome-sentiment-analysis

Start PR if you want to add something more or just write to me. I worked a lot with Amazon data [millions of reviews].

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2021-01-30 12:33
There are many sources to get sentiment analysis dataset:
- huge ngrams dataset from google storage.googleapis.com/books/ngrams/books/datasetsv2.html
- http://www.sananalytics.com/lab/twitter-sentiment/
- http://inclass.kaggle.com/c/si650winter11/data
- http://nlp.stanford.edu/sentiment/treebank.html
- or you can look into this global ML dataset repository: https://archive.ics.uci.edu/ml
Anyway, it does not mean it will help you to get a better accuracy for your current dataset because the corpus might be very different from your dataset. Apart from reducing the testing percentage vs training, you could: test other classifiers or fine tune all hyperparameters using semi-automated wrapper like CVParameterSelection or GridSearch, or even auto-weka if it fits.

It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing.
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-01-30 12:43

Here is a list of datasets that give the sentiments for individual words.. http://positivewordsresearch.com/sentiment-analysis-resources/

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题