问题
I am working on a project that will involve full-text and semantic searches of articles within the site (if it's not possible to combine it, the user can select either option). These articles are subscription-based and can only be searched after logging in; so they are not accessible to external search engines or their APIs.
I read about Sphinx for full text keywords searches (and I intend to implement it for that aspect) but I am not sure how to go about building a semantic search engine out of this. e.g. Searching for "U.S. President" should list articles that contain references to the actual names of the U.S. presidents e.g. George Washington, Bill Clinton (or William Jefferson Clinton).
I have ideas that maybe a sort of tagging system can be used to relate various keywords e.g. relate President to George Washington and President to Bill Clinton, but since the data is really huge and many such relations will exist I don't know how to further this idea.
Please advice me on how to go about building a semantic search engine (I guess Sphinx can handle the full-text keyword search) from scratch. Otherwise, please inform me of any internet-based resources or if there are already existent software in any language that I can integrate into my application.
P.S. My database of choice is MySQL (please advice if another database system is more suitable for the task), and I prefer to program in PHP but if I need to learn Python or any other language that will be more effective to this task, I will be willing.
I already searched at answers.semanticweb.com
回答1:
I would use Apache Solr. I think it's more flexible than Sphinx. Solr supports full-text search and I believe has add-ons for semantic support (like siren). Solr is the serverized version of Lucene.
Solr supports a SynonymFilter: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter
This post discusses some strategies for optimizing content retrieval http://www.lucidimagination.com/devzone/technical-articles/optimizing-findability-lucene-and-solr
回答2:
This book may be useful for someone reading this thread. I just found it on Amazon.
http://www.amazon.com/E-Librarian-Service-User-Friendly-Libraries-X-media-publishing/dp/3642177425
来源:https://stackoverflow.com/questions/10987883/building-a-fast-semantic-mysql-search-engine-for-private-articles-from-scratch