问题
I have reader APIs written in Java that reads data from file system. Data is written to a file system by perl scripts.
To attain better performance from reader APIs, I maintain an in-memory cache (basically a hashmap inside java application itself) for caching last 7 days of data from file system. For achieving this, whenever application starts, a dedicated thread takes care of reading last 7 days of data from file system and updating the cache. And I also created a thread that wakes up every 30 seconds and refreshes the cache (reads entries from FS and updates/adds to the cache).
Now, we want to build search on this data (to start with, maybe just the data that is stored inside cache, i.e. last 7 days of data) using elastic search.
One way that I thought off for achieving this is:
- Whenever java application comes up a thread that reads last 7 days of data from file system can call elastic search APIs for creating the indexes as well. This thread will delete all old indexes from ES. This is to ensure data in cache and data in ES are same when application started.
- Similarly a thread that wakes up every 30s to refresh the cache, should call elastic search API to refresh the elastic search indexes as well.
I am not sure how can I make this system fault tolerant. That is there could be a time when a call to elastic search from a thread that wakes up every 30s fails. With this, data inside application cache and data indexed with Elastic Search won't be in sync. How can I prevent this?
Is there some better approach to build search on this data?
Should I use lucene instead?
来源:https://stackoverflow.com/questions/51166968/how-to-build-elastic-search-indexes-for-data-stored-in-java-application-cache