问题
I have been brainstorming for an Undergraduate Project in Question Answering domain. A project that has components of IR and NLP.
The first thing that popped up, was of course factoid question answering, but that seemed to be an already conquered problem. #IBM Watson!
Non-factoid QA seems interesting, so I took it up. Now, we are in scope-it-out phase of the project description. So, from the ambitious goal - of answering any question put up by the user - I need to scope out our project.
So I took the following decisions:
- It will be closed-domain - C++ Programming
- The corpus will consist of just one website. (cplusplus or wikipedia) or just one document (the complete reference)
- We will develop only one module of the entire QA architecture - Passage Retrieval or Answer Extraction.
Our mentor insists on implementing an already existing solution, to start with. I am stuck at this point, to search for existing implementations. Here is one. But when I read through the environment requirements, it was staggering. There are a lot of libraries and tool kits, but I didn't find any non-factoid QA system, that was good to know at least on a very small scale.
Suggest a good scope for the project. I wish to continue working on this through my masters, so it what would be a good start? We have about 4 months for the project, and it is important not to end up doing a research project. It should have a tangible output.
回答1:
For IR you have Lucene/Solr.
For machine learning and nlp lots of libraries are available, primarily in python and java, at least the user friendly ones.
Implementing Hoifung's system is pretty ambitious, I'd go for something simpler. Have you looked at his code at all?
Something you could find lots of stuff in is the BioNLP challenges from the last few years, but those are also relatively complicated tasks.
How about twitter movie review discovery? Ie based on X tweets, does this movie suck?
来源:https://stackoverflow.com/questions/8196284/ir-and-qa-beginner-project-scope