问题
I'm trying to find an easy way to sync data in mongoDB 4.x, to elasticsearch 6.x . My use case is for partial text search that is supported by elasticsearch but no supported by mongodb. MongoDB is the primary database for my applications.
All solutions i found seem outdated and only support older version of mongoDB / elasticsearch. These include mongodb-connector, mongodb river
What is the best tool to use so that any changes (CRUD) to data in mongoDB is automatically synced to elasticsearch?
回答1:
There is tool called Monstache for migrating teh MOngoDB data to ElasticSearch in real time. This tool supports the latest MongoDB Version.
Its sync daemon written in Go that continously indexes your MongoDB collections into Elasticsearch. Monstache gives you the ability to use Elasticsearch to do complex searches and aggregations of your MongoDB data and easily build realtime Kibana visualizations and dashboards.
Its a go daemon that syncs mongodb to elasticsearch in realtime. Its the Monstache. Its available at : Monstache The monstache required mongodb to run in replication mode.
Below the initial setup to configure and use it.
Step 1:
C:\Program Files\MongoDB\Server\4.0\bin>mongod --smallfiles --oplogSize 50 --replSet test
Step 2 :
C:\Program Files\MongoDB\Server\4.0\bin>mongo
C:\Program Files\MongoDB\Server\4.0\bin>mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Server has startup warnings:
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** WARNING: This server is bound to localhost.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** Remote systems will be unable to connect to this server.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** Start the server with --bind_ip <address> to specify which IP
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** addresses it should serve responses from, or with --bind_ip_all to
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** bind to all interfaces. If this behavior is desired, start the
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** server with --bind_ip 127.0.0.1 to disable this warning.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten]
MongoDB Enterprise test:PRIMARY>
Step 3 : Verify the replication.
MongoDB Enterprise test:PRIMARY> rs.status();
{
"set" : "test",
"date" : ISODate("2019-01-18T11:39:00.380Z"),
"myState" : 1,
"term" : NumberLong(2),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"appliedOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"durableOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1547811517, 1),
"members" : [
{
"_id" : 0,
"name" : "localhost:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 736,
"optime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"optimeDate" : ISODate("2019-01-18T11:38:57Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1547810805, 1),
"electionDate" : ISODate("2019-01-18T11:26:45Z"),
"configVersion" : 1,
"self" : true,
"lastHeartbeatMessage" : ""
}
],
"ok" : 1,
"operationTime" : Timestamp(1547811537, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1547811537, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
MongoDB Enterprise test:PRIMARY>
Step 4.
Download the "https://github.com/rwynn/monstache/releases".
Unzip the download and adjust your PATH variable to include the path to the folder for your platform.
GO to cmd and type "monstache -v"
# 4.13.1
Monstache uses the TOML format for its configuration. Configure the file for migration named config.toml
Step 5.
My config.toml -->
mongo-url = "mongodb://127.0.0.1:27017/?replicaSet=test"
elasticsearch-urls = ["http://localhost:9200"]
direct-read-namespaces = [ "admin.users" ]
gzip = true
stats = true
index-stats = true
elasticsearch-max-conns = 4
elasticsearch-max-seconds = 5
elasticsearch-max-bytes = 8000000
dropped-collections = false
dropped-databases = false
resume = true
resume-write-unsafe = true
resume-name = "default"
index-files = false
file-highlighting = false
verbose = true
exit-after-direct-reads = false
index-as-update=true
index-oplog-time=true
Step 6.
D:\15-1-19>monstache -f config.toml
回答2:
if you work with docker you can get this tutorial
https://github.com/ziedtuihri/Monstache_Elasticsearch_Mongodb
Monstache is a sync daemon written in Go that continuously indexes your MongoDB collections into Elasticsearch. Monstache gives you the ability to use Elasticsearch to do complex searches and aggregations of your MongoDB data and easily build realtime Kibana visualizations and dashboards.
documentation for Monstache :
https://rwynn.github.io/monstache-site/
github :
https://github.com/rwynn/monstache
docker-compose.yml
version: '2.3'
networks:
test:
driver: bridge
services:
db:
image: mongo:3.0.2
expose:
- "27017"
container_name: mongodb
volumes:
- ./mongodb:/data/db
- ./mongodb_config:/data/configdb
ports:
- "27018:27017"
command: mongod --smallfiles --replSet rs0
networks:
- test
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.8.7
container_name: elasticsearch
volumes:
- ./elastic:/usr/share/elasticsearch/data
- ./elastic/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- 9200:9200
command: elasticsearch -Enetwork.host=_local_,_site_ -Enetwork.publish_host=_local_
healthcheck:
test: "wget -q -O - http://localhost:9200/_cat/health"
interval: 1s
timeout: 30s
retries: 300
ulimits:
nproc: 65536
nofile:
soft: 65536
hard: 65536
memlock:
soft: -1
hard: -1
networks:
- test
monstache:
image: rwynn/monstache:rel4
expose:
- "8080"
ports:
- "8080:8080"
container_name: monstache
command: -mongo-url=mongodb://db:27017 -elasticsearch-url=http://elasticsearch:9200 -direct-read-namespace=Product_DB.Product -direct-read-split-max=2
links:
- elasticsearch
- db
depends_on:
db:
condition: service_started
elasticsearch:
condition: service_healthy
networks:
- test
replicaset.sh
#!/bin/bash
# this configuration is so important
echo "Starting replica set initialize"
until mongo --host 192.168.144.2 --eval "print(\"waited for connection\")"
do
sleep 2
done
echo "Connection finished"
echo "Creating replica set"
mongo --host 192.168.144.2 <<EOF
rs.initiate(
{
_id : 'rs0',
members: [
{ _id : 0, host : "db:27017", priority : 1 }
]
}
)
EOF
echo "replica set created"
1) run this commande en terminal $ sysctl -w vm.max_map_count=262144
if you work on a server i don't know if is necessary
2)run en terminal docker-compose build
3) run en terminal $ docker-compose up -d
don't down your container.
$ docker ps
copy the Ipadress of mongo db image
$ docker inspect id_of_mongo_image
copy the IPAddress and set it in replicaset.sh and run replicaset.sh
$ ./replicaset.sh
on terminal you shoulf see => replica set created
$ docker-compose down
4)run en terminal $ docker-compose up
finally .......
Replication in MongoDB
A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.
The primary node receives all write operations. A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary.
View the replica set configuration.
Use rs.conf()
replica set allow you to indexes your MongoDB collections into Elasticsearch en real time synchronization.
来源:https://stackoverflow.com/questions/56661890/mongodb-4-x-real-time-sync-to-elasticsearch-6-x