I have a large set off files (hdf) that I need to enable search for. For Java I would use Lucene for this, as it\'s a file and document indexing engine. I don\'t know what t
I'd suggest Sphinx. It's very active, has much more features and seems faster than Lucene.
I haven't done indexing before, however the following may be helpful :-
As far as using HDF files goes, I have heard of a module called h5py.
I hope this helps.
Elastic search can be used to index documents and search by keywords
Elasticsearch can be integrated with graph db and hadoop as well
Some urls below:
1) https://www.elastic.co/products/elasticsearch
2) https://towardsdatascience.com/getting-started-with-elasticsearch-in-python-c3598e718380
Lupy has been retired and the developers recommend PyLucene instead. As for PyLucene, its mailing list activity may be low, but it is definitely supported. In fact, it just recently became an official apache subproject.
You may also want to look at a new contender: Whoosh. It's similar to lucene, but implemented in pure python.
A popular C++ based information retrieval library that is often used with Python is Xapian http://xapian.org/
It's incredibly quick and can happily manage large amounts of data, however it's not quite as easily extensible as Lucene.