Building a fast semantic MySQL search engine for private articles from scratch

感情迁移 提交于 2019-12-09 04:59:11

问题


I am working on a project that will involve full-text and semantic searches of articles within the site (if it's not possible to combine it, the user can select either option). These articles are subscription-based and can only be searched after logging in; so they are not accessible to external search engines or their APIs.

I read about Sphinx for full text keywords searches (and I intend to implement it for that aspect) but I am not sure how to go about building a semantic search engine out of this. e.g. Searching for "U.S. President" should list articles that contain references to the actual names of the U.S. presidents e.g. George Washington, Bill Clinton (or William Jefferson Clinton).

I have ideas that maybe a sort of tagging system can be used to relate various keywords e.g. relate President to George Washington and President to Bill Clinton, but since the data is really huge and many such relations will exist I don't know how to further this idea.

Please advice me on how to go about building a semantic search engine (I guess Sphinx can handle the full-text keyword search) from scratch. Otherwise, please inform me of any internet-based resources or if there are already existent software in any language that I can integrate into my application.

P.S. My database of choice is MySQL (please advice if another database system is more suitable for the task), and I prefer to program in PHP but if I need to learn Python or any other language that will be more effective to this task, I will be willing.

I already searched at answers.semanticweb.com


回答1:


I would use Apache Solr. I think it's more flexible than Sphinx. Solr supports full-text search and I believe has add-ons for semantic support (like siren). Solr is the serverized version of Lucene.

Solr supports a SynonymFilter: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

This post discusses some strategies for optimizing content retrieval http://www.lucidimagination.com/devzone/technical-articles/optimizing-findability-lucene-and-solr




回答2:


This book may be useful for someone reading this thread. I just found it on Amazon.

http://www.amazon.com/E-Librarian-Service-User-Friendly-Libraries-X-media-publishing/dp/3642177425



来源:https://stackoverflow.com/questions/10987883/building-a-fast-semantic-mysql-search-engine-for-private-articles-from-scratch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!