extracting relations from text

不问归期 提交于 2020-01-01 03:31:17

问题


I want to extract relations from unstructured text in the form of (SUBJECT,OBJECT,ACTION) relations,

for instance,

"The boy is sitting on the table eating the chicken"

would give me,

(boy,chicken,eat)
(boy,table,LOCATION)

etc..

although a python program + NLTK could process such a simple sentence as above.

I'd like to know if any of you have used tools or libraries preferably opensource to extract relations from a much wider domain such as a large collection of text documents or the web.


回答1:


If your sentences do not get much more complicated than the example you have shown (for instance, with respect to anaphoras), the Stanford parser will give good results, based on a probabilistic context-free grammar, that you will easily be able to convert into the format you want. There is a demo available online. For your example, it will give something like

nsubj(sitting, boy)

prep_on(sitting, table)

etc.

If your sentences do get more complicated, you might be interested in trying Boxer, which builds discourse representation structures from C&C parses, based on probabilistic combinatory categorial grammars. Those structures may prove more difficult to adapt to the format you want, but will allow you much more flexibility. There is, again, a demo available online. For your example, it will look something like

sit(x)

boy(y)

table(z)

agent(x,y)

on(x,z)

etc.

The Stanford parser is written in Java and is available under the GPL. C&C is written in C++ and Boxer in SWI Prolog. Those two are not released under a genuinely free licence, but you can obtain the source code, modify it, and use it for any non-commercial project.

Neither will give you a characterisation for the relation between "boy" and "table" in your example—you will need much more powerful semantic reasoning tools for this, and I am not sure whether something like this exists.

Edit

It has now become once more possible to obtain the source code for C&C and Boxer, along with a collection of models.



来源:https://stackoverflow.com/questions/19574549/extracting-relations-from-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!