问题
I have a test ElasticSearch 6.0 index populated with millions of records, likely to be in the billions in production. I need to search for a subset of these records, then save this subset of the original set into a secondary index for later searching. I have proven this out via querying ES on Kibana, the challenge is to find appropriate APIs in Java 8 using my Jest client (searchbox.io, version 5.3.3) to do the same. The ElasticSearch cluster is on AWS, so using a transport client is out.
POST _reindex?slices=10&wait_for_completion=false
{ "conflicts": "proceed",
"source":{
"index": "my_source_idx",
"size": 5000,
"query": { "bool": {
"filter": { "bool" : { "must" : [
{ "nested": { "path": "test", "query": { "bool": { "must":[
{ "terms" : { "test.RowKey": ["abc"]} },
{ "range" : { "test.dates" : { "lte": "2018-01-01", "gte": "2010-08-01"} } },
{ "range" : { "test.DatesCount" : { "gte": 2} } },
{ "script" : { "script" : { "id": "my_painless_script",
"params" : {"min_occurs" : 1, "dateField": "test.dates", "RowKey": ["abc"], "fromDate": "2010-08-01", "toDate": "2018-01-01"}}}}
]}}}}
]}}
}}
},
"dest": {
"index": "my_dest_idx"
},
"script": {
"source": <My painless script>
} }
I am aware I can perform a search on the source index, then create and bulk load the response records onto the new index, but I want to be able to do this all in one shot, as I do have a painless script to glean off some information that is pertinent to the queries that will search the secondary index. Performance is a concern, as the application will be chaining subsequent queries together using the destination index to query against. Does anyone know how to do accomplish this using Jest?
回答1:
It appears as if this particular functionality is not yet supported in Jest. The Jest API It has a way to pass in a script (not a query) as a parameter, but I even was having problems with that.
EDIT:
After some hacking with a coworker, we found a way around this...
Step 1) Extend the GenericResultAbstractionAction class with edits to the script:
public class GenericResultReindexActionHack extends GenericResultAbstractAction {
GenericResultReindexActionHack(GenericResultReindexActionHack.Builder builder) {
super(builder);
Map<String, Object> payload = new HashMap<>();
payload.put("source", builder.source);
payload.put("dest", builder.dest);
if (builder.conflicts != null) {
payload.put("conflicts", builder.conflicts);
}
if (builder.size != null) {
payload.put("size", builder.size);
}
if (builder.script != null) {
Script script = (Script) builder.script;
// Note the script parameter needs to be formatted differently to conform to the ES _reindex API:
payload.put("script", new Gson().toJson(ImmutableMap.of("id", script.getIdOrCode(), "params", script.getParams())));
}
this.payload = ImmutableMap.copyOf(payload);
setURI(buildURI());
}
@Override
protected String buildURI() {
return super.buildURI() + "/_reindex";
}
@Override
public String getRestMethodName() {
return "POST";
}
@Override
public String getData(Gson gson) {
if (payload == null) {
return null;
} else if (payload instanceof String) {
return (String) payload;
} else {
// We need to remove the incorrect formatting for the query, dest, and script fields:
// TODO: Need to consider spaces in the JSON
return gson.toJson(payload).replaceAll("\\\\n", "")
.replace("\\", "")
.replace("query\":\"", "query\":")
.replace("\"},\"dest\"", "},\"dest\"")
.replaceAll("\"script\":\"","\"script\":")
.replaceAll("\"}","}")
.replaceAll("},\"script\"","\"},\"script\"");
}
}
public static class Builder extends GenericResultAbstractAction.Builder<GenericResultReindexActionHack , GenericResultReindexActionHack.Builder> {
private Object source;
private Object dest;
private String conflicts;
private Long size;
private Object script;
public Builder(Object source, Object dest) {
this.source = source;
this.dest = dest;
}
public GenericResultReindexActionHack.Builder conflicts(String conflicts) {
this.conflicts = conflicts;
return this;
}
public GenericResultReindexActionHack.Builder size(Long size) {
this.size = size;
return this;
}
public GenericResultReindexActionHack.Builder script(Object script) {
this.script = script;
return this;
}
public GenericResultReindexActionHack.Builder waitForCompletion(boolean waitForCompletion) {
return setParameter("wait_for_completion", waitForCompletion);
}
public GenericResultReindexActionHack.Builder waitForActiveShards(int waitForActiveShards) {
return setParameter("wait_for_active_shards", waitForActiveShards);
}
public GenericResultReindexActionHack.Builder timeout(long timeout) {
return setParameter("timeout", timeout);
}
public GenericResultReindexActionHack.Builder requestsPerSecond(double requestsPerSecond) {
return setParameter("requests_per_second", requestsPerSecond);
}
public GenericResultReindexActionHack build() {
return new GenericResultReindexActionHack(this);
}
}
}
Step 2) Use of this class with a query then requires you to pass in the query as part of the source, then remove the '\n' characters:
ImmutableMap<String, Object> sourceMap = ImmutableMap.of("index", sourceIndex, "query", qb.toString().replaceAll("\\\\n", ""));
ImmutableMap<String, Object> destMap = ImmutableMap.of("index", destIndex);
GenericResultReindexActionHack reindex = new GenericResultReindexActionHack.Builder(sourceMap, destMap)
.waitForCompletion(false)
.conflicts("proceed")
.size(5000L)
.script(reindexScript)
.setParameter("slices", 10)
.build();
JestResult result = handleResult(reindex);
String task = result.getJsonString();
return (task);
Note the reindexScript parameter is of type org.elasticsearch.script.
This is a messy, hack-y way of getting around the limitations of Jest, but it seems to work. I understand that by doing it this way there may be some limitations to what may be acceptable in the input formatting...
来源:https://stackoverflow.com/questions/51072212/reindex-part-of-elasticsearch-index-onto-new-index-via-jest