问题
I've been doing a little comparison of these two packages and am not sure which direction to go in. What I am looking for briefly is:
- Named Entity Recognition (people, places, organizations and such).
- Gender identification.
- A decent training API.
From what I can tell, OpenNLP and Stanford CoreNLP expose pretty similar capabilities. However, Stanford CoreNLP looks like it has a lot more activity whereas OpenNLP has only had a few commits in the last six months.
Based on what I saw, OpenNLP appears to be easier to train new models and might be more attractive for that reason alone. However, my question is what would others start with as the basis for adding NLP features to a Java app? I'm mostly worried as to whether OpenNLP is "just mature" versus semi-abandoned.
回答1:
In full disclosure, I'm a contributor to CoreNLP, so this is a biased answer. But, in my view on your three criteria:
Named Entity Recognition: I think CoreNLP clearly wins here, both on accuracy and ease-of-use. For one, OpenNLP has a model per NER tag, whereas CoreNLP detects all tags with a single Annotator. Furthermore, temporal resolution with SUTime is a nice perk in CoreNLP. Accuracy-wise, my anecdotal experience is that CoreNLP does better on general-purpose text.
Gender identification. I think both tools are kind of poorly documented on this front. OpenNLP seems to have a GenderModel class; CoreNLP has a gender Annotator.
Training API. I suspect the OpenNLP training API is easier-to-use for not off-the-shelf training. But, if all you want to do is, e.g., train a model from a CoNLL file, both should be straightforward. Training speed tends to be faster with CoreNLP than other tools I've tried, but I haven't benchmarked it formally, so take that with a grain of salt.
回答2:
A bit late here, but I recently looking at OpenNLP based just on the fact that Stanford is GPL licenced - if thats ok for your project then Stanford is often referred to as the benchmark/state-of-the-art for NLP.
That said, the performance for the pre-trained models will depend on your target text as it is very domain specific. If your target text is similar to the data that the models were trained against then you should get decent results, but if not then you will have to train the models yourself and it will depend on the training data.
A strength of OpenNlp it that it is very extensible and is written for easy use with other libraries and has a good API for integrating - the training is very simple (once you have your training data) with OpenNLP (I wrote about it here - with a pretty lousy generated data set I was able to get ok results identifying foods), and it is very configurable - you can configure all the parameters around training very easily and there are a range of algorithms you can use (perceptron, max entropy, and in the snapshot version they have added Naive Bayes)
If you find that you do need to train the models yourself, I would consider trying out OpenNlp and seeing how it performs just for comparison, as with fine tuning you can get pretty decent results.
回答3:
That depends on your purpose and need, what i know about these two is OpenNLP
is opensource and CoreNLP
is not of course.
But If you will look at the accuracy level Stanford CoreNLP
have more accurate detection than OpenNLP
. Recently I did comparison for the Part Of Speech (POS)
tagging for both and yes which is the most imp part in any NLP task, So in my analysis the winner was CoreNLP
.
Going forward for NER
there as well CoreNLP
have the more accurate results compare to OpenNLP
.
So if you are just starting you can take up OpenNLP
later if needed you can migrate to Stanford CoreNLP
.
来源:https://stackoverflow.com/questions/40025981/opennlp-vs-stanford-corenlp