Avoid rebuilding index through jdbc-river on elasticsearch

问题

I am using the following:

ElasticSearch – v0.90.9

JDBC connector for MySQL – v5.1.28

ElasticSearch River – v2.3.1

I am able to build and query the indexed data using ElasticSearch. The above mentioned versions are installed on Ubuntu 12.04 LTS virtual machine and ElasticSearch runs as a service which starts automatically after a system reboot.

Let us say that when there are no indices and I use ElasticSearch River to build a new index and issue a PUT command to build index, the index is built and everything works fine. Now, the issue is this index is rebuilt again when I shutdown the virtual machine and restart it again. I find this annoying and is there a way to prevent automatic rebuild of index?

Is there something with ElasticSearch River or ElasticSearch settings that I should be aware of to prevent automatic index rebuild? In my case this is causing duplicates.

Thanks in advance.

回答1:

The only way I have found to stop it re-indexing is to delete the river document after it has run.

However, if the issue you have is that documents are duplicated, what you would need to do is identify an id field. There are two ways to do this, either import the data with a field labeled "_id" or by identifying an id field when you create the mapping for that index which identifies the id field like the example below.

PUT my_index
{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 3
    },
    "mappings": {
        "my_type": {
            "properties":{
                "field1": { "type": "string", "analyzer": "keyword" }
        },
        "_id": { "path": "field1" }
    }
}

来源：https://stackoverflow.com/questions/21714588/avoid-rebuilding-index-through-jdbc-river-on-elasticsearch

标签

ElasticSearch

elasticsearch-jdbc-river