问题
- I have many docements in mongoDB. Mongo-connector inserts those data to elasticsearch. Is there a way, before inserting in to ES where we can add extra field to the document and then insert into elasticsearch? Is there any way in mongo-connector to do the above?
UPDATE
based on your UPDATE 3 i created mappings some thing like this is it correct?
PUT my_index2
{
"mappings":{
"my_type2": {
"transform": {
"script": {
"inline": "if (ctx._source.geopoint.alt) ctx._source.geopoint.remove('alt')",
"lang": "groovy"
}
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
ERROR
This what the error i keep getting when i tried to insert your mapping
{
"error": {
"root_cause": [
{
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [my_type2]: Value must be of type String: [script]",
"caused_by": {
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
},
"status": 400
}
UPDATE 2
Now the mapping is getting inserted and getting the acknowledge as true. But when try to insert the json data like below its throwing error.
PUT my_index2/my_type2/1
{
"geopoint": {
"lon": 48.845877,
"lat": 8.821861,
"alt": 0.0
}
}
ERROR FOR UPDATE2
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "scripts of type [inline], operation [mapping] and lang [groovy] are disabled"
}
}
},
"status": 400
}
ERROR 1 FOR UPDATE 2
After adding script.inline:true, tried to insert the data but getting following error.
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
},
"status": 400
}
回答1:
mongo-connector aims at synchronizing a Mongo database with another target system, such as ES, Solr or another Mongo DB. Synchronizing means 1:1 replication, so there's no way that I know of for mongo-connector to enrich documents during the replication (and it's not its intent either).
However, in ES 5 we'll soon be able to use ingest nodes in which we'll be able to define processing pipelines whose goal is to enrich documents before they get indexed.
UPDATE
There's probably a way by modifying the formatters.py file.
In transform_value I would add a case to handle Geopoint
:
if isinstance(value, dict):
return self.format_document(value)
elif isinstance(value, list):
return [self.transform_value(v) for v in value]
# handle Geopoint class
elif isinstance(value, Geopoint):
return self.format.document({'lat': value['lat'], 'lon': value['lon']})
...
UPDATE 2
Let's try another approach by modifying the transform_element function (on line 104):
def transform_element(self, key, value):
try:
# add these next two lines
if key == 'GeoPoint':
value = {'lat': value['lat'], 'lon': value['lon']}
# do not modify the initial code below
new_value = self.transform_value(value)
yield key, new_value
except ValueError as e:
LOG.warn("Invalid value for key: %s as %s"
% (key, str(e)))
UPDATE 3
Another thing you might try is to add a transform. The reason I've not mentioned it before is that it was deprecated in ES 2.0, but in ES 5.0 you'll have ingest nodes and you'll be able to take care of it at ingest time using a remove processor
You can define your mapping like this:
PUT my_index2
{
"mappings": {
"my_type2": {
"transform": {
"script": "ctx._source.geopoint.remove('alt'); ctx._source.geopoint.remove('valid')"
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
Note: make sure enable dynamic scripting, by adding script.inline: true
to elasticsearch.yml
and restart your ES node.
What is going to happen is that the alt
field will still be visible in the stored _source
but it will not be indexed, and hence, no error should occur.
With ES 5, you'd simply create a pipeline with a remove
processor, like this:
PUT _ingest/pipeline/geo-pipeline
{
"description" : "remove unsupported altitude field",
"processors" : [
{
"remove" : {
"field": "geopoint.alt"
}
}
]
}
来源:https://stackoverflow.com/questions/36772351/does-mongo-connector-supports-adding-fields-before-inserting-to-elasticsearch