问题
I have a SQL table that I am exporting to Elasticsearch.
One of the columns is a numeric field that is nullable, with nulls in some of the records.
When we try to index the table, we get this error:
One of the ETL (BigQuery -> ElasticSearch) jobs for Table : MLS has been ES Failed Chunk of 10000 from index 20000 possibly due to incompatible objects.
Failing BigQuery Table: MLS
Stack Trace of the error:
Traceback (most recent call last): File "/Users/asif/zodiacbackend/zodiacbackend/tasks.py", line 205, in insertIntoES helpers.bulk(es, doc_generator(dataframe,table)) File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 300, in bulk for ok, item in streaming_bulk(client, actions, *args, **kwargs): File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 230, in streaming_bulk **kwargs File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk raise BulkIndexError("%i document(s) failed to index." % len(errors), errors) elasticsearch.helpers.errors.BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'mls', '_type': 'mls', '_id': 'b100qHABEFI45Lp-z3Om', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'mapper [Lot_Size_Sq_Ft] of different type, current_type [text], merged_type [long]'}, 'data': { 'Lot_Size_Sq_Ft': Decimal('13504')}}}])
How do I get the system to recognize nulls?
回答1:
User WittyID, missed some important things like:
- values of
null_value
must be of the same data-type, of your field, so in his example, he declaredinteger
field but definedNULL
asnull_values
, would throwjson_parse_exception
and this is mentioned asimportant
in the official link as below:
The null_value needs to be the same data type as the field. For instance, a long field cannot have a string null_value.
- The
null_value
only influences how data is indexed, it doesn’t modify the _source document, so in your source document, whatever you passed, will be stored, not the one mentioned innull_values
param and at query time also you need to use the valuenull_value
param..
In short, null
isn't recognized in ES, hence you can define your custom values for null
and then use it to index and query the null
values.It's easy to explain the entire thing using the below example, which anybody can try:
Create index
{
"mappings": {
"properties": {
"my_signed_integer": {
"type":"integer",
"null_value": -1 --> note we defining `null` values as `-1`.
}
}
}
}
Index doc
store
null
integer docs{ "my_number" : null }
If you get this doc from ES it would e returned as below:
{
"_index": "so-6053847",
"_type": "_doc",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"my_number": null. --> As explained earlier, in source its stored as `null`.
}
}
Index non-negative value
{ "my_number" : 10 }
Search query to fetch integer which had null
values
{
"query": {
"term": {
"my_signed_integer": -1 -->notice same `null_value`, you need to mention
}
}
}
Result:
"hits": [
{
"_index": "so-6053847",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"my_signed_integer": null --> notice it shows `null`, not `-1`
}
}
]
Search query for other numbers(not null) ie in our case 10
{
"query": {
"term": {
"my_signed_integer": 10
}
}
}
Result
"hits": [
{
"_index": "so-6053847",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"my_signed_integer": 10 -->source matches the indexed value for this doc
}
}
]
回答2:
You're dealing with a common ES head scratcher. Elasticsearch doesn't index null values (not just the numeric nulls). You need to specify in your index mapping how you want any detected null values to be indexed. Something like this:
"mappings": {
"properties": {
"nullable_numeric": {
"type": "integer",
"null_value": -1
},
"nullable_text": {
"type": "text",
"null_value": "NULL"
}
}
Once you do this, ES would know how to properly index those fields. Note, you don't need to change your raw data, just let ES know how to index nulls for search....which by the way, won't affect the docs when you query ES.
来源:https://stackoverflow.com/questions/60538475/how-to-handle-nulls-in-an-elasticsearch-index