How to store date range data in elastic search (aws) and search for a range?

前端 未结 2 462
无人共我
无人共我 2021-02-03 16:06

I am trying to store hotel room availability in elasticsearch. And then I need to search rooms those are available from a date till another date. I have come up with two ways

相关标签:
2条回答
  • 2021-02-03 16:33

    One way to model this would be with parent/child documents. Room documents would be parent documents and availability documents would be their child documents. For each room, there would be one availability document per date the room is available. Then, at query time, we can query for parent rooms which have one availability child document for each date in the searched interval (even disjoint ones).

    Note that you'll need to make sure that as soon as a room is booked, you remove the corresponding child documents for each booked date.

    Let's try this out. First create the index:

    PUT /rooms
    {
      "mappings": {
        "room": {
          "properties": {
            "room_num": {
              "type": "integer"
            }
          }
        },
        "availability": {
          "_parent": {
            "type": "room"
          },
          "properties": {
            "date": {
              "type": "date",
              "format": "date"
            },
            "available": {
              "type": "boolean"
            }
          }
        }
      }
    }
    

    Then add some data

    POST /rooms/_bulk
    {"_index": { "_type": "room", "_id": 233}}
    {"room_num": 233}
    {"_index": { "_type": "availability", "_id": "20160701", "_parent": 233}}
    {"date": "2016-07-01"}
    {"_index": { "_type": "availability", "_id": "20160702", "_parent": 233}}
    {"date": "2016-07-02"}
    {"_index": { "_type": "availability", "_id": "20160704", "_parent": 233}}
    {"date": "2016-07-04"}
    {"_index": { "_type": "availability", "_id": "20160705", "_parent": 233}}
    {"date": "2016-07-05"}
    {"_index": { "_type": "availability", "_id": "20160707", "_parent": 233}}
    {"date": "2016-07-07"}
    {"_index": { "_type": "availability", "_id": "20160708", "_parent": 233}}
    {"date": "2016-07-08"}
    

    Finally, we can start querying. First, let's say we want to find a room that is available on 2016-07-01:

    POST /rooms/room/_search
    {
      "query": {
        "has_child": {
          "type": "availability",
          "query": {
            "term": {
              "date": "2016-07-01"
            }
          }
        }
      }
    }
    => result: room 233
    

    Then, let's try searching for a room available from 2016-07-01 to 2016-07-03

    POST /rooms/room/_search
    {
      "query": {
        "bool": {
          "minimum_should_match": 3,
          "should": [
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-01"
                  }
                }
              }
            },
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-02"
                  }
                }
              }
            },
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-03"
                  }
                }
              }
            }
          ]
        }
      }
    }
    => Result: No rooms
    

    However, searching for a room available from 2016-07-01 to 2016-07-02 does yield room 233

    POST /rooms/room/_search
    {
      "query": {
        "bool": {
          "minimum_should_match": 2,
          "should": [
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-01"
                  }
                }
              }
            },
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-02"
                  }
                }
              }
            }
          ]
        }
      }
    }
    => Result: Room 233
    

    We can also search for disjoint intervals, say from 2016-07-01 to 2016-07-02 + from 2016-07-04 to 2016-07-05

    POST /rooms/room/_search
    {
      "query": {
        "bool": {
          "minimum_should_match": 4,
          "should": [
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-01"
                  }
                }
              }
            },
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-02"
                  }
                }
              }
            },
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-04"
                  }
                }
              }
            },
            {
              "has_child": {
                "type": "availability",
                "query": {
                  "term": {
                    "date": "2016-07-05"
                  }
                }
              }
            }
          ]
        }
      }
    }
    => Result: Room 233
    

    And so on... The key point is to add one has_child query per date you need to check availability for and set minimum_should_match to the number of dates you're checking.

    UPDATE

    Another option would be to use a script filter, but with 100 million documents, I'm not certain it would scale that well.

    In this scenario you can keep your original design (preferably the second one, because with the first one, you'll create too many unnecessary fields in your mapping) and the query would look like this:

    POST /rooms/room/_search
    {
      "query": {
        "bool": {
          "filter": {
            "script": {
              "script": {
                "inline": "def dates = doc.availability.sort(false); from = Date.parse('yyyy-MM-dd', from); to = Date.parse('yyyy-MM-dd', to); def days = to - from; def fromIndex = doc.availability.values.indexOf(from.time); def toIndex = doc.availability.values.indexOf(to.time); return days == (toIndex - fromIndex)",
                "params": {
                  "from": "2016-07-01",
                  "to": "2016-07-04"
                }
              }
            }
          }
        }
      }
    }
    
    0 讨论(0)
  • 2021-02-03 16:56

    i am new and just learning ES. What are the disadvantages of this setup/mapping ?

    ciao..remco

    0 讨论(0)
提交回复
热议问题