Is there a way to improve memory performance when using an elasticsearch percolator index?
I have created a separate index for my percolator. I have roughly 1 000 000 u
There is no resolution to this issue from an ElasticSearch point of view nor is one likely. I have chatted to the ElasticSearch guys directly and their answer is: "throw more hardware at it".
I have however found a way to solve this problem in terms of mitigating my usage of this feature. When I analyzed my saved search data I discovered that my searches consisted of around 100 000 unique keyword searches along with various filter permutations creating over 1 000 000 saved searches.
If I look at the filters they are things like:
Giving a solution space of:
100 000 * >300 * >50 * ... ~= > 1 500 000 000
However if I were to decompose the searches and index the keyword searches and filters separately in the percolator index, I end up with far less searches:
100 000 + >300 + >50 + ... ~= > 100 350
And those searches themselves are smaller and less complicated than the original searches.
Now I create a second (non-percolator) index listing all 1 000 000 saved searches and including the ids of the search components from the percolator index.
Then I percolate a document and then do a second query filtering the searches against the keyword and filter percolator results. I'm even able to preserve the relevance score as this is returned purely from the keyword searches.
This approach will significantly reduce my percolator index memory footprint while serving the same purpose.
I would like to invite feedback on this approach (I haven't tried it yet but I will keep you posted).
Likewise if my approach is successful do you think it is worth a feature request?