Preferred method of indexing bulk data into ElasticSearch?

主宰稳场 提交于 2019-12-03 12:47:33

We use RabbitMQ to pipe data from SQL Server to ES. That way Rabbit takes care of the queuing and processing.

As a note, we can run over 4000 records per second from SQL into Rabbit. We do a bit more processing before putting the data into ES but we still insert into ES at over 1000 records per second. Pretty damn impressive on both ends. Rabbit and ES are both awesome!

There are a lot of things that you can do. You can put your data in rabbitmq or redis, but your main problem is staying up to date. I guess you should look into an event based application. But if you really only have the sql server as a datasource you could work with timestamps and a query that checks for updates. Depending on the size of your database you can also just reindex the complete dataset.

Using events or the query based solution, you can push these updates to elasticsearch, probably using the bulk api.

The good part about a custom solution like this is that you can think about your mapping. This is important if you really want to do something smart with your data.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!