How to store relational data in elasticsearch

时光怂恿深爱的人放手 提交于 2019-11-28 06:15:47

问题


What are the options to store relational data in elasticsearch. I know the following approaches

  1. Nested object :- I don't want to store data in nested format because I want to update the one document without changing the other document and if I use nested object then there will be repetition of child data in parent documents.

  2. Parent-child :- I don't want to store data in single index, but for using Parent-child data needs to be present in one index(different types). I know this restriction will be removed in future release as mentioned in https://github.com/elastic/elasticsearch/issues/15613 issue, but I want a solution that should work with 5.5 version.

Is there any other approach other then above.


回答1:


Nested Object is a perfect approach for it. There will be no repetition of child objects in parent document if you update the child objects correctly. I'm using the same approach for one of my use case where I need to maintain relational data of Master-Child One-to-Many relationship. I've written a Painless script for Update API to Add & Update existing nested child objects inside parent document without creating duplicates or repetitive entries.

Updated Answer:

Below is the structure of Parent-Child Nested Type document with embedded nested type documents "childs".

{
    "parent_id": 1,
    "parent_name": "ABC",
    "parent_number": 123,
    "parent_addr": "123 6th St. Melbourne, FL 32904"
    "childs": [
      {
        "child_id": 1,
        "child_name": "PQR",
        "child_number": 456,
        "child_age": 10
      },
      {
        "child_id": 2,
        "child_name": "XYZ",
        "child_number": 789,
        "child_age": 12
      },
      {
        "child_id": 3,
        "child_name": "QWE",
        "child_number": 234,
        "child_age": 16
      }

    ]   
}

Mapping would be as below:

PUT parent/
{
  "parent": {
    "mappings": {
      "parent": {
        "properties": {
          "parent_id": {
            "type": "long"
          },
          "parent_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "parent_number": {
            "type": "long"
          },
          "parent_addr": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "child_tickets": {
            "type": "nested",
            "properties": {
              "child_id": {
                "type": "long"
              },
              "child_name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "child_number": {
                "type": "long"
              },
              "child_age": {
                "type": "long"
              }
            }
          }
        }
      }
    }
  }
}

In RDMS, these both entities(parent, child) are two different tables with One to Many relation between Parent -> Child. Parent's id is foreign key for Child's row. (id is must for both tables)

Now in Elasticsearch, to index the parent document we must have id to index it, in this case it is parent_id. Index Parent document Query(parent_id is the id which i was talking about and have index the document with id(_id) = 1):

POST parent/parent/1
{
    "parent_id": 1,
    "parent_name": "ABC",
    "parent_number": 123,
    "parent_addr": "123 6th St. Melbourne, FL 32904"
}

Now, adding child(s) to the parent. For that you will require child document which should have child id plus parent id. To add a child, parent id is must. Below is the update query to add new childs or update already present childs.

POST parent/parent/1/_update
{
    "script":{
    "lang":"painless",
    "inline":"if (!ctx._source.containsKey(\"childs\")) {
                ctx._source.childs = [];
                ctx._source.childs.add(params.child);
            } else {
                int flag=0;
                for(int i=0;i<ctx._source.childs.size();i++){
                    if(ctx._source.childs[i].child_id==params.child.child_id){
                        ctx._source.childs[i]=params.child;
                        flag++;
                    }
                }
                if(flag==0){
                    ctx._source.childs.add(params.child);
                }
            }",
    "params":{
        "child":{
                "child_id": 1,
                "child_name": "PQR",
                "child_number": 456,
                "child_age": 10
            }
        }
    }
}

Give it a shot. Cheers!

Let me know if you need anything else.




回答2:


There are two more approaches: Denormalization and running multiple queries for joins.

Denormalization will eat up some more space and increase your write time, but you will just need to run one query to retrieve your data, hence, your read time will improve. Since you don't want to store data in a single index, so joining might help you out.



来源:https://stackoverflow.com/questions/46101012/how-to-store-relational-data-in-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!