How do we create a simple search engine using Lucene, Solr or Nutch?

前端 未结 10 2356
孤城傲影
孤城傲影 2021-02-15 11:49

Our company has thousands of PDF documents. How do we create a simple search engine using Lucene, Solr or Nutch? We\'ll provide a basic Java/JSP web page were people can type

10条回答
  •  执念已碎
    2021-02-15 12:44

    None of the projects in the Lucene family can natively process PDFs, but there are utilities you can drop in and well written examples on how to roll your own.

    Lucene will do pretty much whatever you need it to do, but there is overhead in terms of your time, as Tony said above. Thousands of documents really isn't that many, so you might be able to get away with a lighter weight alternative.

    That said, I would still recommend looking at Solr - it's much, much easier to set up than Lucene, has support for backups, replication, etc., as well as a nifty JSON interface which would fit your use case very well: http://wiki.apache.org/solr/SolJSON

提交回复
热议问题