Stanford NLP core 4.0.0 no longer splitting verbs and pronouns in Spanish

问题

Very helpfully Stanford NLP core 3.9.2 used to split rolled together Spanish verbs and pronouns

This is the 4.0.0 output:

The previous version had more .tagger files. These have not been included with the 4.0.0 distribution.

Is that the cause. Will be they added back?

回答1:

There are some documentation updates that still need to be made for Stanford CoreNLP 4.0.0.

A major change is that a new multi-word-token annotator has been added, that makes tokenization conform with the UD standard. So the new default Spanish pipeline should run tokenize,ssplit,mwt,pos,depparse,ner. It may not be possible to run such a pipeline from the server demo at this time, as some modifications will need to be made. I can try to send you what such modifications would be soon. We will try to make a new release in early summer to handle issues like this that we missed.

It won't split the word in your example unfortunately, but I think in many cases it will do the correct thing. The Spanish mwt model is just based off of a large dictionary of terms, and was tuned to optimize performance on the Spanish training data.

来源：https://stackoverflow.com/questions/61540771/stanford-nlp-core-4-0-0-no-longer-splitting-verbs-and-pronouns-in-spanish

标签

windows

stanford-nlp

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!