Reindex part of Elasticsearch index onto new index via Jest

淺唱寂寞╮ 提交于 2019-12-24 18:41:31

问题


I have a test ElasticSearch 6.0 index populated with millions of records, likely to be in the billions in production. I need to search for a subset of these records, then save this subset of the original set into a secondary index for later searching. I have proven this out via querying ES on Kibana, the challenge is to find appropriate APIs in Java 8 using my Jest client (searchbox.io, version 5.3.3) to do the same. The ElasticSearch cluster is on AWS, so using a transport client is out.

POST _reindex?slices=10&wait_for_completion=false
{ "conflicts": "proceed",
  "source":{
    "index": "my_source_idx",
    "size": 5000,
    "query": { "bool": {
      "filter": { "bool" : { "must" : [
        { "nested": { "path": "test", "query": { "bool": { "must":[
           { "terms" : { "test.RowKey": ["abc"]} },
           { "range" : { "test.dates" : { "lte": "2018-01-01", "gte": "2010-08-01"} } },
           { "range" : { "test.DatesCount" : { "gte": 2} } },
           { "script" : { "script" : { "id": "my_painless_script", 
              "params" : {"min_occurs" : 1, "dateField": "test.dates", "RowKey": ["abc"], "fromDate": "2010-08-01", "toDate": "2018-01-01"}}}}
        ]}}}}
      ]}}
    }}
  },
  "dest": {
    "index": "my_dest_idx"
  },
  "script": {
    "source": <My painless script>
  } }

I am aware I can perform a search on the source index, then create and bulk load the response records onto the new index, but I want to be able to do this all in one shot, as I do have a painless script to glean off some information that is pertinent to the queries that will search the secondary index. Performance is a concern, as the application will be chaining subsequent queries together using the destination index to query against. Does anyone know how to do accomplish this using Jest?


回答1:


It appears as if this particular functionality is not yet supported in Jest. The Jest API It has a way to pass in a script (not a query) as a parameter, but I even was having problems with that.

EDIT:

After some hacking with a coworker, we found a way around this...

Step 1) Extend the GenericResultAbstractionAction class with edits to the script:

public class GenericResultReindexActionHack extends GenericResultAbstractAction {
    GenericResultReindexActionHack(GenericResultReindexActionHack.Builder builder) {
        super(builder);

        Map<String, Object> payload = new HashMap<>();
        payload.put("source", builder.source);
    payload.put("dest", builder.dest);
    if (builder.conflicts != null) {
        payload.put("conflicts", builder.conflicts);
    }
    if (builder.size != null) {
        payload.put("size", builder.size);
    }
    if (builder.script != null) {
        Script script = (Script) builder.script;

// Note the script parameter needs to be formatted differently to conform to the ES _reindex API:

        payload.put("script", new Gson().toJson(ImmutableMap.of("id", script.getIdOrCode(), "params", script.getParams())));
    }
    this.payload = ImmutableMap.copyOf(payload);

    setURI(buildURI());
}

@Override
protected String buildURI() {
    return super.buildURI() + "/_reindex";
}

@Override
public String getRestMethodName() {
    return "POST";
}

@Override
public String getData(Gson gson) {
    if (payload == null) {
        return null;
    } else if (payload instanceof String) {
        return (String) payload;
    } else {

// We need to remove the incorrect formatting for the query, dest, and script fields:

        // TODO: Need to consider spaces in the JSON
        return gson.toJson(payload).replaceAll("\\\\n", "")
                        .replace("\\", "")
                        .replace("query\":\"", "query\":")
                        .replace("\"},\"dest\"", "},\"dest\"")
                        .replaceAll("\"script\":\"","\"script\":")
                .replaceAll("\"}","}")
                .replaceAll("},\"script\"","\"},\"script\"");

    }
}

public static class Builder extends GenericResultAbstractAction.Builder<GenericResultReindexActionHack , GenericResultReindexActionHack.Builder> {

    private Object source;
    private Object dest;
    private String conflicts;
    private Long size;
    private Object script;

    public Builder(Object source, Object dest) {
        this.source = source;
        this.dest = dest;
    }

    public GenericResultReindexActionHack.Builder conflicts(String conflicts) {
        this.conflicts = conflicts;
        return this;
    }

    public GenericResultReindexActionHack.Builder size(Long size) {
        this.size = size;
        return this;
    }

    public GenericResultReindexActionHack.Builder script(Object script) {
        this.script = script;
        return this;
    }

    public GenericResultReindexActionHack.Builder waitForCompletion(boolean waitForCompletion) {
        return setParameter("wait_for_completion", waitForCompletion);
    }

    public GenericResultReindexActionHack.Builder waitForActiveShards(int waitForActiveShards) {
        return setParameter("wait_for_active_shards", waitForActiveShards);
    }

    public GenericResultReindexActionHack.Builder timeout(long timeout) {
        return setParameter("timeout", timeout);
    }

    public GenericResultReindexActionHack.Builder requestsPerSecond(double requestsPerSecond) {
        return setParameter("requests_per_second", requestsPerSecond);
    }

    public GenericResultReindexActionHack build() {
        return new GenericResultReindexActionHack(this);
    }
}

}

Step 2) Use of this class with a query then requires you to pass in the query as part of the source, then remove the '\n' characters:

ImmutableMap<String, Object> sourceMap = ImmutableMap.of("index", sourceIndex, "query", qb.toString().replaceAll("\\\\n", ""));
        ImmutableMap<String, Object> destMap = ImmutableMap.of("index", destIndex);

GenericResultReindexActionHack reindex = new GenericResultReindexActionHack.Builder(sourceMap, destMap)
                .waitForCompletion(false)
                .conflicts("proceed")
                .size(5000L)
                .script(reindexScript)
                .setParameter("slices", 10)
                .build();

        JestResult result = handleResult(reindex);
        String task = result.getJsonString();
        return (task);

Note the reindexScript parameter is of type org.elasticsearch.script.

This is a messy, hack-y way of getting around the limitations of Jest, but it seems to work. I understand that by doing it this way there may be some limitations to what may be acceptable in the input formatting...



来源:https://stackoverflow.com/questions/51072212/reindex-part-of-elasticsearch-index-onto-new-index-via-jest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!