Avoid rebuilding index through jdbc-river on elasticsearch

让人想犯罪 __ 提交于 2019-12-11 13:39:29

问题


I am using the following:

ElasticSearch – v0.90.9

JDBC connector for MySQL – v5.1.28

ElasticSearch River – v2.3.1

I am able to build and query the indexed data using ElasticSearch. The above mentioned versions are installed on Ubuntu 12.04 LTS virtual machine and ElasticSearch runs as a service which starts automatically after a system reboot.

Let us say that when there are no indices and I use ElasticSearch River to build a new index and issue a PUT command to build index, the index is built and everything works fine. Now, the issue is this index is rebuilt again when I shutdown the virtual machine and restart it again. I find this annoying and is there a way to prevent automatic rebuild of index?

Is there something with ElasticSearch River or ElasticSearch settings that I should be aware of to prevent automatic index rebuild? In my case this is causing duplicates.

Thanks in advance.


回答1:


The only way I have found to stop it re-indexing is to delete the river document after it has run.

However, if the issue you have is that documents are duplicated, what you would need to do is identify an id field. There are two ways to do this, either import the data with a field labeled "_id" or by identifying an id field when you create the mapping for that index which identifies the id field like the example below.

PUT my_index
{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 3
    },
    "mappings": {
        "my_type": {
            "properties":{
                "field1": { "type": "string", "analyzer": "keyword" }
        },
        "_id": { "path": "field1" }
    }
}


来源:https://stackoverflow.com/questions/21714588/avoid-rebuilding-index-through-jdbc-river-on-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!