问题
Elasticsearch and command line programming noobie question.
I have elasticsearch set up locally on my computer and want to pull documents from a server that uses a different version of es using the scan and scroll api and add them into my index. I am having trouble figuring out how to do this with the bulk api for es.
Right now in my testing phase I am just pulling a few documents from the server using the following code (which works):
http MY-OLD-ES.com:9200/INDEX/TYPE/_search?size=1000 | jq .hits.hits[] -c | while read x; do id="`echo "$x" | jq -r ._id`"; index="`echo "$x" | jq -r ._index`"; type="`echo "$x" | jq -r ._type`"; doc="`echo "$x" | jq ._source`"; http put "localhost:9200/junk-$index/$type/$id" <<<"$doc"; done
Any tips on how scan and scroll works (noob and a bit confused). So far know I can scroll and get a scroll id, but I'm unclear what to do with the scroll id. If I call
http get http://MY-OLD-ES.com:9200/my_index/_search?scroll=1m&search_type=scan&size=10
I'll receive a scroll id. Can this be piped in and parsed the same way? Additionally, I believe I'll need a while loop to tell it to keep requesting. How exactly should I go about this?
Thanks!
回答1:
The scan and scroll documentation explains it pretty clearly. After you get the scroll_id
(a long base64 encoded string), you pass it in with the body of the request. With curl the request would looks something like this:
curl -XGET 'http://MY-OLD-ES.com:9200/_search/scroll?scroll=1m' -d '
c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0
NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy
UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW
xfaGl0czoxOw==
'
Notice that while the first request to open the scroll was to /my_index/_search
, the second request to read the data was to /_search/scroll
. Each time you call that, passing the ?scroll=1m
querystring, it refreshes the timeout before the scroll is automatically closed.
There are two more things to be aware of:
- The
size
you pass when opening the scroll applies to each shard, so you will getsize
multiplied by the number of shards in your index on each request. - Each request to
/_search/scroll
will return a newscroll_id
which you must pass on the next call to get the next batch of results. You can't just keep calling with the samescroll_id
.
It is complete when no hits are returned in the scroll request.
来源:https://stackoverflow.com/questions/28844530/elasticsearch-scan-and-scroll-add-to-new-index