NLP to find relationship between entities

安稳与你 提交于 2019-12-03 15:45:25
vpekar

You can extract verbs with their dependants using Stanford Parser, for example. E.g., you might get "dependency chains" like

"I :: spent :: at :: CERN". 

It is a much tougher task to recognise that "I spent at CERN" and "I visited CERN" and "CERN hosted my visit" (etc) denote the same kind of event. Going into how this can be done is beyond the scope of an SO question, but you can read up literature of paraphrases recognition (here is one overview paper). There is also a related question on SO.

Once you can cluster similar chains, you'd need to find a way to label them. You could simply choose the verb of the most common chain in a cluster.

If, however, you have a pre-defined set of relation types you want to extract and lots of texts manually annotated for these relations, then the approach could be very different, e.g., using machine learning to learn how to recognize a relation type based on annotated data.

Don't know if you're still interested but CoreNLP added a new annotator called OpenIE (Open Information Extraction), which should accomplish what you're looking for. Check it out: OpenIE

Yes absolutely. This is called Relation Extraction. Stanford has developed several useful tools for working on this problem.

Here is there website: http://deepdive.stanford.edu/relation_extraction Here is the github repository: https://github.com/philipperemy/Stanford-OpenIE-Python

In general here is how the process works.

results = entract_entity_relations("Barack Obama was born in Hawaii.")
print(results)
# [['Barack Obama','was born in', 'Hawaii']]

Of some importance is that only triples are extracted of the form (subject,predicate,object).

Similar to the Stanford parser, you can also use the Google Language API, where you send a string and get a dependency tree response.

You can test this API first to see if it works well with your corpus: https://cloud.google.com/natural-language/

The outcome here is a subject predicate object (SPO) triplet, where your predicate describes the relationship. You'll need to traverse the dependency graph and write a script to parse out the triplet.

There are many ways to do relation extraction. As colleagues mentioned that you have to know about NER and coreference resolution. Different techniques require different approaches. Nowadays, Distant Supervision is most common, and for detecting the relation between entities, they used FREEBASE.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!