Sync MongoDb to ElasticSearch

纵饮孤独 提交于 2021-02-11 12:52:23

问题


I am looking for a way to sync collections in MongoDB with Elastic Search (ES). The goal is to have MongoDB as a primary data source and use MongoDB as a full text search engine. (The business logic of my project is written in python).

Several approaches are online available.

  • Mongo-connect
  • River plugin
  • logstash-input-mongodb (logstash plugin) see similar question
  • Transporter

However, most of the suggestions are several years old and I could not find any solution that supports the current version of ES (ES 7.4.0). Is anyone using such a construct? Do you have any suggestions?

I thought about dropping MongoDB as primary data source and just using ES for storing and searching. Though I have read that ES should not be used as a primary data source.


Edit

Thank you @gurdeep.sabarwal. I followed your approach. However, I do not manage to sync the mongodb to ES. My configuration looks like this:

input {
    jdbc {
#        jdbc_driver_library => "/usr/share/logstash/mongodb-driver-3.11.0-source.jar"
        jdbc_driver_library => "/usr/share/logstash/mongojdbc1.5.jar"
#        jdbc_driver_library => "/usr/share/logstash/mongodb-driver-3.11.1.jar"

#        jdbc_driver_class => "mongodb.jdbc.MongoDriver"
#        jdbc_driver_class => "Java::com.mongodb.MongoClient"
        jdbc_driver_class => "Java::com.dbschema.MongoJdbcDriver"
        jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
#        jdbc_driver_class => ""

        jdbc_connection_string => "jdbc:mongodb://<myserver>:27017/<mydb>"
        jdbc_user => "user"
        jdbc_password => "pw"
        statement => "db.getCollection('mycollection').find({})"
    }
}

output {
    elasticsearch {
        hosts => ["http://localhost:9200/"]
        index => "myindex"
    }
}

This brings me a bit closer to my goal. However, I get the following error:

Error: Java::com.dbschema.MongoJdbcDriver not loaded. Are you sure you've included the correct jdbc driver in :jdbc_driver_library?
Exception: LogStash::ConfigurationError`

Since, it did not work, I tried also the commented version but did not succeed.


回答1:


  1. download https://dbschema.com/jdbc-drivers/MongoDbJdbcDriver.zip
  2. unzip and copy all the files to the path(~/logstash-7.4.2/logstash-core/lib/jars/)
  3. modify the config file(mongo-logstash.conf) below:
  4. run: ~/logstash-7.4.2/bin/logstash -f mongo-logstash.conf
  5. success, please try it!

ps: this is my first answer in stackoverflow :-)

input {
  jdbc{
    # NOT THIS # jdbc_driver_class => "Java::mongodb.jdbc.MongoDriver"
    jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
    jdbc_driver_library => "mongojdbc1.5.jar"
    jdbc_user => "" #no user and pwd
    jdbc_password => ""
    jdbc_connection_string => "jdbc:mongodb://127.0.0.1:27017/db1"
    statement => "db.t1.find()"
  }
}

output {
    #stdout { codec => dots }
    stdout { }
}



回答2:


For ELK stack, I have implemented using (1st and 2nd ) approach and while doing research i came accross multiple appraches , so you could pick anyone. but my personal choice is 1st or 2nd becoz it give you lots of option for customization.

if you need code let me know,i can share snippet of it. i don't want to make answer long!.

1.Use dbSchemeJdbc jar(https://dbschema.com) to stream data from mongodb to ElasticSearch.

a.OpenSource dbSchemeJdbc jar

b.You could write native mongodb query or aggregation query directly in logstash.

your pipeline may look like the following:

input {
  jdbc{
    jdbc_user => "user"
    jdbc_password => "pass"
    jdbc_driver_class => "Java::com.dbschema.MongoJdbcDriver"
    jdbc_driver_library => "mongojdbc1.2.jar"
    jdbc_connection_string => "jdbc:mongodb://user:pass@host1:27060/cdcsmb"
    statement => "db.product.find()"
  }
}
output {
  stdout {
    codec => rubydebug 
  }
  elasticsearch {
    hosts => "localhost:9200"
    index => "target_index"
    document_type => "document_type"
    document_id => "%{id}"
  }
}

2.Use unityJdbc jar (http://unityjdbc.com) to stream data from mongodb to ElasticSearch

a.You have to pay for unityjdbc jar

b.You could write SQL format query in logstash to get data from mongodb.

your pipeline may look like the following:

input {
  jdbc{
    jdbc_user => "user"
    jdbc_password => "pass"
    jdbc_driver_class => "Java::mongodb.jdbc.MongoDriver"
    jdbc_driver_library => "mongodb_unityjdbc_full.jar"
    jdbc_connection_string => "jdbc:mongodb://user:pass@host1:27060/cdcsmb"
    statement=> "SELECT * FROM employee WHERE status = 'active'" 
  }
}
output {
  stdout {
    codec => rubydebug 
  }
  elasticsearch {
    hosts => "localhost:9200"
    index => "target_index"
    document_type => "document_type"
    document_id => "%{id}"
  }
}

3.Use logstash-input-mongodb(https://github.com/phutchins/logstash-input-mongodb) plugin to stream data from mongodb to ElasticSearch

a.opensource kind of

b.you get very less option for customization, it will dump entire collection, you can not write query or write aggregation query etc .

4.you can write you own program in python or java and connect with mongodb and index data in elastic search,then you can use cron to schedule it.

5.you can use node js Mongoosastic npm(https://www.npmjs.com/package/mongoosastic), the only overhead of this is it will commit change on mongo and ES both to keep it in sync.




回答3:


Monstache seems a good option too as it support the latests versions of both elasticsearch and mongodb: https://github.com/rwynn/monstache



来源:https://stackoverflow.com/questions/58342818/sync-mongodb-to-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!