问题
My aim is to build an aggregrator of news feeds and blog feeds so as to make searching/tracking of entitites in it easy. I have been looking at many solutions out there like Terrier, Lucene, SWISH-E, etc.
Basically, I could find only 2 sources of comparison studies done on these engines and one of them is kinda outdated. Basically I want a search engine which would be used in a case in which the data size is not too large, but the indexing will be frequent, every 30 minutes or so. I feel Terrier is not a good tool to be used in this case. It works better when the data size is large and updation frequency is low. Can somebody who has worked in the Information Retrieval field offer some advice ?
回答1:
Lucene is well known and supported, so personally, that would be my first choice.
回答2:
If you find a ready-to-use search engine, check out fastcatsearch.
It has been developed for commercial search, and applied to a lot of various sites.
Faster than lucene, and has web-based web manager to use easily.
Hosted in github, and check it out. https://github.com/fastcatgroup/fastcatsearch
来源:https://stackoverflow.com/questions/1418372/which-open-source-search-engine-should-be-used