问题
I am trying to store hotel room availability in elasticsearch. And then I need to search rooms those are available from a date till another date. I have come up with two ways to store data for availability, and they are as follows:
Here availability dictionary store all dates and value of each date key is true of false, representing its available on that day or not.
{
"_id": "khg2uo47tyhgjwebu7624787",
"room_type": "garden view",
"hotel_name": "Cool hotel",
"hotel_id": "jytu64r982u0299023",
"room_metadata1": 233,
"room_color": "black",
"availability": {
"2016-07-01": true,
"2016-07-02": true,
"2016-07-03": false,
"2016-07-04": true,
"2016-07-05": true,
"2016-07-06": null,
"2016-07-07": true,
"2016-07-08": true,
----
----
for 365 days
}
}
Here availability array only stores those dates when room is available
{
"_id": "khg2uo47tyhgjwebu7624787",
"room_type": "garden view",
"hotel_name": "Cool hotel",
"hotel_id": "jytu64r982u0299023",
"room_metadata1": 535,
"room_color": "black",
"availability": ["2016-07-01", "2016-07-02", "2016-07-04", "2016-07-05", "2016-07-07", "2016-07-08"] ---for 365 days
}
}
I want to search all rooms, those are available from from_date
till to_date
and that should look into availability
dictionary or array.And my date range may span up to 365 days
How to store these availability data, so that I can perform the above search easily? And I could not find any way to search through range of dates, so any suggestion?
Please note, items
in availability
may not be kept sorted. And I may have more than 100 million records to search through.
回答1:
One way to model this would be with parent/child documents. Room documents would be parent documents and availability documents would be their child documents. For each room, there would be one availability document per date the room is available. Then, at query time, we can query for parent rooms which have one availability child document for each date in the searched interval (even disjoint ones).
Note that you'll need to make sure that as soon as a room is booked, you remove the corresponding child documents for each booked date.
Let's try this out. First create the index:
PUT /rooms
{
"mappings": {
"room": {
"properties": {
"room_num": {
"type": "integer"
}
}
},
"availability": {
"_parent": {
"type": "room"
},
"properties": {
"date": {
"type": "date",
"format": "date"
},
"available": {
"type": "boolean"
}
}
}
}
}
Then add some data
POST /rooms/_bulk
{"_index": { "_type": "room", "_id": 233}}
{"room_num": 233}
{"_index": { "_type": "availability", "_id": "20160701", "_parent": 233}}
{"date": "2016-07-01"}
{"_index": { "_type": "availability", "_id": "20160702", "_parent": 233}}
{"date": "2016-07-02"}
{"_index": { "_type": "availability", "_id": "20160704", "_parent": 233}}
{"date": "2016-07-04"}
{"_index": { "_type": "availability", "_id": "20160705", "_parent": 233}}
{"date": "2016-07-05"}
{"_index": { "_type": "availability", "_id": "20160707", "_parent": 233}}
{"date": "2016-07-07"}
{"_index": { "_type": "availability", "_id": "20160708", "_parent": 233}}
{"date": "2016-07-08"}
Finally, we can start querying. First, let's say we want to find a room that is available on 2016-07-01
:
POST /rooms/room/_search
{
"query": {
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-01"
}
}
}
}
}
=> result: room 233
Then, let's try searching for a room available from 2016-07-01
to 2016-07-03
POST /rooms/room/_search
{
"query": {
"bool": {
"minimum_should_match": 3,
"should": [
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-01"
}
}
}
},
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-02"
}
}
}
},
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-03"
}
}
}
}
]
}
}
}
=> Result: No rooms
However, searching for a room available from 2016-07-01
to 2016-07-02
does yield room 233
POST /rooms/room/_search
{
"query": {
"bool": {
"minimum_should_match": 2,
"should": [
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-01"
}
}
}
},
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-02"
}
}
}
}
]
}
}
}
=> Result: Room 233
We can also search for disjoint intervals, say from 2016-07-01
to 2016-07-02
+ from 2016-07-04
to 2016-07-05
POST /rooms/room/_search
{
"query": {
"bool": {
"minimum_should_match": 4,
"should": [
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-01"
}
}
}
},
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-02"
}
}
}
},
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-04"
}
}
}
},
{
"has_child": {
"type": "availability",
"query": {
"term": {
"date": "2016-07-05"
}
}
}
}
]
}
}
}
=> Result: Room 233
And so on... The key point is to add one has_child
query per date you need to check availability for and set minimum_should_match
to the number of dates you're checking.
UPDATE
Another option would be to use a script filter, but with 100 million documents, I'm not certain it would scale that well.
In this scenario you can keep your original design (preferably the second one, because with the first one, you'll create too many unnecessary fields in your mapping) and the query would look like this:
POST /rooms/room/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"inline": "def dates = doc.availability.sort(false); from = Date.parse('yyyy-MM-dd', from); to = Date.parse('yyyy-MM-dd', to); def days = to - from; def fromIndex = doc.availability.values.indexOf(from.time); def toIndex = doc.availability.values.indexOf(to.time); return days == (toIndex - fromIndex)",
"params": {
"from": "2016-07-01",
"to": "2016-07-04"
}
}
}
}
}
}
}
回答2:
i am new and just learning ES. What are the disadvantages of this setup/mapping ?
ciao..remco
来源:https://stackoverflow.com/questions/37824365/how-to-store-date-range-data-in-elastic-search-aws-and-search-for-a-range