How to handle nulls in an Elasticsearch index

微笑、不失礼 提交于 2021-02-05 07:17:15

问题


I have a SQL table that I am exporting to Elasticsearch.

One of the columns is a numeric field that is nullable, with nulls in some of the records.

When we try to index the table, we get this error:

One of the ETL (BigQuery -> ElasticSearch) jobs for Table : MLS has been ES Failed Chunk of 10000 from index 20000 possibly due to incompatible objects.

Failing BigQuery Table: MLS

Stack Trace of the error:

Traceback (most recent call last): File "/Users/asif/zodiacbackend/zodiacbackend/tasks.py", line 205, in insertIntoES helpers.bulk(es, doc_generator(dataframe,table)) File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 300, in bulk for ok, item in streaming_bulk(client, actions, *args, **kwargs): File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 230, in streaming_bulk **kwargs File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk raise BulkIndexError("%i document(s) failed to index." % len(errors), errors) elasticsearch.helpers.errors.BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'mls', '_type': 'mls', '_id': 'b100qHABEFI45Lp-z3Om', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'mapper [Lot_Size_Sq_Ft] of different type, current_type [text], merged_type [long]'}, 'data': { 'Lot_Size_Sq_Ft': Decimal('13504')}}}]) 

How do I get the system to recognize nulls?


回答1:


User WittyID, missed some important things like:

  1. values of null_value must be of the same data-type, of your field, so in his example, he declared integer field but defined NULL as null_values, would throw json_parse_exception and this is mentioned as important in the official link as below:

The null_value needs to be the same data type as the field. For instance, a long field cannot have a string null_value.

  1. The null_value only influences how data is indexed, it doesn’t modify the _source document, so in your source document, whatever you passed, will be stored, not the one mentioned in null_values param and at query time also you need to use the value null_value param..

In short, null isn't recognized in ES, hence you can define your custom values for null and then use it to index and query the null values.It's easy to explain the entire thing using the below example, which anybody can try:

Create index

{
  "mappings": {
    "properties": {
      "my_signed_integer": {
        "type":"integer",
        "null_value": -1 --> note we defining `null` values as `-1`.
      }
    }
  }
}

Index doc

  1. store null integer docs

    { "my_number" : null }

If you get this doc from ES it would e returned as below:

{
   "_index": "so-6053847",
   "_type": "_doc",
   "_id": "1",
   "_version": 1,
   "_seq_no": 0,
   "_primary_term": 1,
   "found": true,
   "_source": {
      "my_number": null. --> As explained earlier, in source its stored as `null`.
   }
}
  1. Index non-negative value

    { "my_number" : 10 }

Search query to fetch integer which had null values

{
  "query": {
    "term": {
      "my_signed_integer": -1 -->notice same `null_value`, you need to mention
    }
  }
}

Result:

 "hits": [
         {
            "_index": "so-6053847",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
               "my_signed_integer": null --> notice it shows `null`, not `-1`
            }
         }
      ]

Search query for other numbers(not null) ie in our case 10

{
  "query": {
    "term": {
      "my_signed_integer": 10
    }
  }
}

Result

"hits": [
         {
            "_index": "so-6053847",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
               "my_signed_integer": 10 -->source matches the indexed value for this doc
            }
         }
      ]



回答2:


You're dealing with a common ES head scratcher. Elasticsearch doesn't index null values (not just the numeric nulls). You need to specify in your index mapping how you want any detected null values to be indexed. Something like this:

  "mappings": {
    "properties": {
      "nullable_numeric": {
        "type":       "integer",
        "null_value": -1 
      },
      "nullable_text": {
       "type":        "text",
       "null_value":  "NULL"
    }
  }

Once you do this, ES would know how to properly index those fields. Note, you don't need to change your raw data, just let ES know how to index nulls for search....which by the way, won't affect the docs when you query ES.



来源:https://stackoverflow.com/questions/60538475/how-to-handle-nulls-in-an-elasticsearch-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!