问题
I want to extract relations from unstructured text in the form of (SUBJECT,OBJECT,ACTION) relations,
for instance,
"The boy is sitting on the table eating the chicken"
would give me,
(boy,chicken,eat)
(boy,table,LOCATION)
etc..
although a python program + NLTK could process such a simple sentence as above.
I'd like to know if any of you have used tools or libraries preferably opensource to extract relations from a much wider domain such as a large collection of text documents or the web.
回答1:
If your sentences do not get much more complicated than the example you have shown (for instance, with respect to anaphoras), the Stanford parser will give good results, based on a probabilistic context-free grammar, that you will easily be able to convert into the format you want. There is a demo available online. For your example, it will give something like
nsubj(sitting, boy)
prep_on(sitting, table)
etc.
If your sentences do get more complicated, you might be interested in trying Boxer, which builds discourse representation structures from C&C parses, based on probabilistic combinatory categorial grammars. Those structures may prove more difficult to adapt to the format you want, but will allow you much more flexibility. There is, again, a demo available online. For your example, it will look something like
sit(x)
boy(y)
table(z)
agent(x,y)
on(x,z)
etc.
The Stanford parser is written in Java and is available under the GPL. C&C is written in C++ and Boxer in SWI Prolog. Those two are not released under a genuinely free licence, but you can obtain the source code, modify it, and use it for any non-commercial project.
Neither will give you a characterisation for the relation between "boy" and "table" in your example—you will need much more powerful semantic reasoning tools for this, and I am not sure whether something like this exists.
Edit
It has now become once more possible to obtain the source code for C&C and Boxer, along with a collection of models.
来源:https://stackoverflow.com/questions/19574549/extracting-relations-from-text