OpenNLP vs Stanford CoreNLP

后端 未结 3 1954
囚心锁ツ
囚心锁ツ 2021-02-07 11:25

I\'ve been doing a little comparison of these two packages and am not sure which direction to go in. What I am looking for briefly is:

  1. Named Entity Recognition (pe
相关标签:
3条回答
  • 2021-02-07 11:41

    That depends on your purpose and need, what i know about these two is OpenNLP is opensource and CoreNLP is not of course.

    But If you will look at the accuracy level Stanford CoreNLP have more accurate detection than OpenNLP. Recently I did comparison for the Part Of Speech (POS) tagging for both and yes which is the most imp part in any NLP task, So in my analysis the winner was CoreNLP.

    Going forward for NER there as well CoreNLP have the more accurate results compare to OpenNLP.

    So if you are just starting you can take up OpenNLP later if needed you can migrate to Stanford CoreNLP.

    0 讨论(0)
  • 2021-02-07 11:44

    A bit late here, but I recently looking at OpenNLP based just on the fact that Stanford is GPL licenced - if thats ok for your project then Stanford is often referred to as the benchmark/state-of-the-art for NLP.

    That said, the performance for the pre-trained models will depend on your target text as it is very domain specific. If your target text is similar to the data that the models were trained against then you should get decent results, but if not then you will have to train the models yourself and it will depend on the training data.

    A strength of OpenNlp it that it is very extensible and is written for easy use with other libraries and has a good API for integrating - the training is very simple (once you have your training data) with OpenNLP (I wrote about it here - with a pretty lousy generated data set I was able to get ok results identifying foods), and it is very configurable - you can configure all the parameters around training very easily and there are a range of algorithms you can use (perceptron, max entropy, and in the snapshot version they have added Naive Bayes)

    If you find that you do need to train the models yourself, I would consider trying out OpenNlp and seeing how it performs just for comparison, as with fine tuning you can get pretty decent results.

    0 讨论(0)
  • 2021-02-07 11:45

    In full disclosure, I'm a contributor to CoreNLP, so this is a biased answer. But, in my view on your three criteria:

    1. Named Entity Recognition: I think CoreNLP clearly wins here, both on accuracy and ease-of-use. For one, OpenNLP has a model per NER tag, whereas CoreNLP detects all tags with a single Annotator. Furthermore, temporal resolution with SUTime is a nice perk in CoreNLP. Accuracy-wise, my anecdotal experience is that CoreNLP does better on general-purpose text.

    2. Gender identification. I think both tools are kind of poorly documented on this front. OpenNLP seems to have a GenderModel class; CoreNLP has a gender Annotator.

    3. Training API. I suspect the OpenNLP training API is easier-to-use for not off-the-shelf training. But, if all you want to do is, e.g., train a model from a CoNLL file, both should be straightforward. Training speed tends to be faster with CoreNLP than other tools I've tried, but I haven't benchmarked it formally, so take that with a grain of salt.

    0 讨论(0)
提交回复
热议问题