How to load data in MongoDB running in host from inside a Docker running on the same machine?

谁说胖子不能爱 提交于 2021-02-05 10:51:27

问题


I am running a Pytorch docker container by the following command in a Ubuntu 18.02 machine:

# Run Pytorch container image
docker run -it -v /home/ubuntu/Downloads/docker_work/test_py_app/app:/workspace/app -p 8881:8888 -p 5002:5002 --gpus all --rm nvcr.io/nvidia/pytorch:20.08-py3

On the same machine I've a MongoDB running, which has the following details:

database_name = 'data_analytics'
collection_name = 'TestDB'
server = 'localhost'
mongodb_port = 27017

I'm running the below code outside the docker in the local machine to test out the code, it works perfectly fine by creating/updating the existing collection:

import pandas as pd
import os
import json
import pymongo
from pymongo import MongoClient
import os
import glob


def dataframe_cleaner(csv_path):
    df = pd.read_csv(csv_path)
    df.columns = df.columns.str.replace('[#,@,&,.]', '')
    df.columns = df.columns.str.replace(' ', '_')
    df.columns = [x.lower() for x in df.columns]
    return df


def mongo_loader(dataframe, db_name, collection_name, server, mongodb_port):
    client = MongoClient(server, int(mongodb_port))
    db = client[db_name]
    # print(db)

    records = json.loads(dataframe.T.to_json()).values()
    db.TestDB.insert(records)
    return True

csv_path = '/home/ubuntu/Downloads/test.csv'


database_name = 'data_analytics'
collection_name = 'TestDB'
server = 'localhost'
mongodb_port = 27017

df = dataframe_cleaner(csv_path)
criteria = mongo_loader(df, database_name, collection_name, server, mongodb_port)

As per suggestion here, I've updated the server = 'localhost' to server = 'host.docker.internal' and running the same code inside the docker to read a csv file and push the data to MongoDB outside the docker on the same host machine, but to no avail, I still get the same error:

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:30: DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.
---------------------------------------------------------------------------
ServerSelectionTimeoutError               Traceback (most recent call last)
<ipython-input-4-458f690221ff> in <module>
     41 
     42 df = dataframe_cleaner(csv_path)
---> 43 criteria = mongo_loader(df, database_name, collection_name, server, mongodb_port)
     44 
     45 #if criteria is True:

<ipython-input-4-458f690221ff> in mongo_loader(dataframe, db_name, collection_name, server, mongodb_port)
     28 
     29     records = json.loads(dataframe.T.to_json()).values()
---> 30     db.TestDB.insert(records)
     31     return True
     32 

/opt/conda/lib/python3.6/site-packages/pymongo/collection.py in insert(self, doc_or_docs, manipulate, check_keys, continue_on_error, **kwargs)
   3292             write_concern = WriteConcern(**kwargs)
   3293         return self._insert(doc_or_docs, not continue_on_error,
-> 3294                             check_keys, manipulate, write_concern)
   3295 
   3296     def update(self, spec, document, upsert=False, manipulate=False,

/opt/conda/lib/python3.6/site-packages/pymongo/collection.py in _insert(self, docs, ordered, check_keys, manipulate, write_concern, op_id, bypass_doc_val, session)
    647         blk.ops = [(message._INSERT, doc) for doc in gen()]
    648         try:
--> 649             blk.execute(write_concern, session=session)
    650         except BulkWriteError as bwe:
    651             _raise_last_error(bwe.details)

/opt/conda/lib/python3.6/site-packages/pymongo/bulk.py in execute(self, write_concern, session)
    526                 self.execute_no_results(sock_info, generator)
    527         else:
--> 528             return self.execute_command(generator, write_concern, session)
    529 
    530 

/opt/conda/lib/python3.6/site-packages/pymongo/bulk.py in execute_command(self, generator, write_concern, session)
    356 
    357         client = self.collection.database.client
--> 358         with client._tmp_session(session) as s:
    359             client._retry_with_session(
    360                 self.is_retryable, retryable_bulk, s, self)

/opt/conda/lib/python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in _tmp_session(self, session, close)
   1827             return
   1828 
-> 1829         s = self._ensure_session(session)
   1830         if s:
   1831             try:

/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in _ensure_session(self, session)
   1814             # Don't make implicit sessions causally consistent. Applications
   1815             # should always opt-in.
-> 1816             return self.__start_session(True, causal_consistency=False)
   1817         except (ConfigurationError, InvalidOperation):
   1818             # Sessions not supported, or multiple users authenticated.

/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in __start_session(self, implicit, **kwargs)
   1764 
   1765         # Raises ConfigurationError if sessions are not supported.
-> 1766         server_session = self._get_server_session()
   1767         opts = client_session.SessionOptions(**kwargs)
   1768         return client_session.ClientSession(

/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in _get_server_session(self)
   1800     def _get_server_session(self):
   1801         """Internal: start or resume a _ServerSession."""
-> 1802         return self._topology.get_server_session()
   1803 
   1804     def _return_server_session(self, server_session, lock):

/opt/conda/lib/python3.6/site-packages/pymongo/topology.py in get_server_session(self)
    486                             any_server_selector,
    487                             self._settings.server_selection_timeout,
--> 488                             None)
    489                 elif not self._description.readable_servers:
    490                     self._select_servers_loop(

/opt/conda/lib/python3.6/site-packages/pymongo/topology.py in _select_servers_loop(self, selector, timeout, address)
    215                 raise ServerSelectionTimeoutError(
    216                     "%s, Timeout: %ss, Topology Description: %r" %
--> 217                     (self._error_message(selector), timeout, self.description))
    218 
    219             self._ensure_opened()

ServerSelectionTimeoutError: host.docker.internal:27017: [Errno -2] Name or service not known, Timeout: 30s, Topology Description: <TopologyDescription id: 601a3b8e6563d1163530d9c1, topology_type: Single, servers: [<ServerDescription ('host.docker.internal', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('host.docker.internal:27017: [Errno -2] Name or service not known',)>]>

Kindly help!


回答1:


Aakash, it's not clear to me if the MongoDB server is running as a docker container or if it's a standard application on the docker host.

Docker runs multiple networks possibly with different drivers, so you have to attach pytorch to a network that has access to your MongoDB instance network.

If MongoDB is running as an application on the host machine, add a --network="host" flag to your pytorch command.

docker run -it -v /home/ubuntu/Downloads/docker_work/test_py_app/app:/workspace/app -p 8881:8888 -p 5002:5002 --gpus all --network="host" --rm nvcr.io/nvidia/pytorch:20.08-py3

This will instruct docker to bind pytorch to the real network(s) interface(s) and give it access to mongo via localhost: 27017

If MongoDB is running as a docker container, make sure that when you run it you mapped its port to the outside world or if you're running pytorch on the same virtual network as it.

To simply expose the port, make sure that a -p 27017:27017 flag exists on the docker run command.

To use the same virtual network, check the Networks key on the output of the docker inspect MONGO_CONTAINER_ID command and add the same name as --network="name" on your pytorch execution.

For more information, take a look at the docker network manual.



来源:https://stackoverflow.com/questions/66022448/how-to-load-data-in-mongodb-running-in-host-from-inside-a-docker-running-on-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!