问题
I currently have a single collection with 40 million documents and index size of 25 GB. The collections gets updated every n minutes and as a result the number of deleted documents is constantly growing. The data in the collection is an amalgamation of more than 1000+ customer records. The number of documents per each customer is around 100,000 records on average.
Now that being said, I 'm trying to get an handle on the growing deleted document size. Because of the growing index size both the disk space and memory is being used up. And would like to reduce it to a manageable size.
I have been thinking of splitting the data into multiple core, 1 for each customer. This would allow me manage the smaller collection easily and can create/update the collection also fast. My concern is that number of collections might become an issue. Any suggestions on how to address this problem.
Solr: 4.9
Index size:25 GB
Max doc: 40 million
Doc count:29 million
Thanks
回答1:
I had the similar sort of issue having multiple customer and big indexed data.
I have the implemented it with version 3.4 by creating a separate core for a customer.
i.e One core per customer. Creating core is some sort of creating indexes or splitting the data as like we do in case of sharding...
Here you are splitting the large indexed data in different smaller segments.
Whatever the seach will happen it will carry in the smaller indexed segment.. so the response time would be faster..
I have almost 700 core created as of now and its running fine for me.
As of now I did not face any issue with managing the core...
I would suggest to go with combination of core and sharding...
It will help you in achieve
Allows to have a different configuration for each core with different behavior and that will not have impact on other cores.
you can perform action like update, load etc. on each core differently.
来源:https://stackoverflow.com/questions/31691606/solr-multicore-vs-sharding-vs-1-big-collection