问题
I am running a Pytorch docker container by the following command in a Ubuntu 18.02 machine:
# Run Pytorch container image
docker run -it -v /home/ubuntu/Downloads/docker_work/test_py_app/app:/workspace/app -p 8881:8888 -p 5002:5002 --gpus all --rm nvcr.io/nvidia/pytorch:20.08-py3
On the same machine I've a MongoDB running, which has the following details:
database_name = 'data_analytics'
collection_name = 'TestDB'
server = 'localhost'
mongodb_port = 27017
I'm running the below code outside the docker in the local machine to test out the code, it works perfectly fine by creating/updating the existing collection:
import pandas as pd
import os
import json
import pymongo
from pymongo import MongoClient
import os
import glob
def dataframe_cleaner(csv_path):
df = pd.read_csv(csv_path)
df.columns = df.columns.str.replace('[#,@,&,.]', '')
df.columns = df.columns.str.replace(' ', '_')
df.columns = [x.lower() for x in df.columns]
return df
def mongo_loader(dataframe, db_name, collection_name, server, mongodb_port):
client = MongoClient(server, int(mongodb_port))
db = client[db_name]
# print(db)
records = json.loads(dataframe.T.to_json()).values()
db.TestDB.insert(records)
return True
csv_path = '/home/ubuntu/Downloads/test.csv'
database_name = 'data_analytics'
collection_name = 'TestDB'
server = 'localhost'
mongodb_port = 27017
df = dataframe_cleaner(csv_path)
criteria = mongo_loader(df, database_name, collection_name, server, mongodb_port)
As per suggestion here, I've updated the server = 'localhost' to server = 'host.docker.internal' and running the same code inside the docker to read a csv file and push the data to MongoDB outside the docker on the same host machine, but to no avail, I still get the same error:
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:30: DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.
---------------------------------------------------------------------------
ServerSelectionTimeoutError Traceback (most recent call last)
<ipython-input-4-458f690221ff> in <module>
41
42 df = dataframe_cleaner(csv_path)
---> 43 criteria = mongo_loader(df, database_name, collection_name, server, mongodb_port)
44
45 #if criteria is True:
<ipython-input-4-458f690221ff> in mongo_loader(dataframe, db_name, collection_name, server, mongodb_port)
28
29 records = json.loads(dataframe.T.to_json()).values()
---> 30 db.TestDB.insert(records)
31 return True
32
/opt/conda/lib/python3.6/site-packages/pymongo/collection.py in insert(self, doc_or_docs, manipulate, check_keys, continue_on_error, **kwargs)
3292 write_concern = WriteConcern(**kwargs)
3293 return self._insert(doc_or_docs, not continue_on_error,
-> 3294 check_keys, manipulate, write_concern)
3295
3296 def update(self, spec, document, upsert=False, manipulate=False,
/opt/conda/lib/python3.6/site-packages/pymongo/collection.py in _insert(self, docs, ordered, check_keys, manipulate, write_concern, op_id, bypass_doc_val, session)
647 blk.ops = [(message._INSERT, doc) for doc in gen()]
648 try:
--> 649 blk.execute(write_concern, session=session)
650 except BulkWriteError as bwe:
651 _raise_last_error(bwe.details)
/opt/conda/lib/python3.6/site-packages/pymongo/bulk.py in execute(self, write_concern, session)
526 self.execute_no_results(sock_info, generator)
527 else:
--> 528 return self.execute_command(generator, write_concern, session)
529
530
/opt/conda/lib/python3.6/site-packages/pymongo/bulk.py in execute_command(self, generator, write_concern, session)
356
357 client = self.collection.database.client
--> 358 with client._tmp_session(session) as s:
359 client._retry_with_session(
360 self.is_retryable, retryable_bulk, s, self)
/opt/conda/lib/python3.6/contextlib.py in __enter__(self)
79 def __enter__(self):
80 try:
---> 81 return next(self.gen)
82 except StopIteration:
83 raise RuntimeError("generator didn't yield") from None
/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in _tmp_session(self, session, close)
1827 return
1828
-> 1829 s = self._ensure_session(session)
1830 if s:
1831 try:
/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in _ensure_session(self, session)
1814 # Don't make implicit sessions causally consistent. Applications
1815 # should always opt-in.
-> 1816 return self.__start_session(True, causal_consistency=False)
1817 except (ConfigurationError, InvalidOperation):
1818 # Sessions not supported, or multiple users authenticated.
/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in __start_session(self, implicit, **kwargs)
1764
1765 # Raises ConfigurationError if sessions are not supported.
-> 1766 server_session = self._get_server_session()
1767 opts = client_session.SessionOptions(**kwargs)
1768 return client_session.ClientSession(
/opt/conda/lib/python3.6/site-packages/pymongo/mongo_client.py in _get_server_session(self)
1800 def _get_server_session(self):
1801 """Internal: start or resume a _ServerSession."""
-> 1802 return self._topology.get_server_session()
1803
1804 def _return_server_session(self, server_session, lock):
/opt/conda/lib/python3.6/site-packages/pymongo/topology.py in get_server_session(self)
486 any_server_selector,
487 self._settings.server_selection_timeout,
--> 488 None)
489 elif not self._description.readable_servers:
490 self._select_servers_loop(
/opt/conda/lib/python3.6/site-packages/pymongo/topology.py in _select_servers_loop(self, selector, timeout, address)
215 raise ServerSelectionTimeoutError(
216 "%s, Timeout: %ss, Topology Description: %r" %
--> 217 (self._error_message(selector), timeout, self.description))
218
219 self._ensure_opened()
ServerSelectionTimeoutError: host.docker.internal:27017: [Errno -2] Name or service not known, Timeout: 30s, Topology Description: <TopologyDescription id: 601a3b8e6563d1163530d9c1, topology_type: Single, servers: [<ServerDescription ('host.docker.internal', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('host.docker.internal:27017: [Errno -2] Name or service not known',)>]>
Kindly help!
回答1:
Aakash, it's not clear to me if the MongoDB server is running as a docker container or if it's a standard application on the docker host.
Docker runs multiple networks possibly with different drivers, so you have to attach pytorch to a network that has access to your MongoDB instance network.
If MongoDB is running as an application on the host machine, add a --network="host"
flag to your pytorch command.
docker run -it -v /home/ubuntu/Downloads/docker_work/test_py_app/app:/workspace/app -p 8881:8888 -p 5002:5002 --gpus all --network="host" --rm nvcr.io/nvidia/pytorch:20.08-py3
This will instruct docker to bind pytorch to the real network(s) interface(s) and give it access to mongo via localhost: 27017
If MongoDB is running as a docker container, make sure that when you run it you mapped its port to the outside world or if you're running pytorch on the same virtual network as it.
To simply expose the port, make sure that a -p 27017:27017
flag exists on the docker run command.
To use the same virtual network, check the Networks
key on the output of the docker inspect MONGO_CONTAINER_ID
command and add the same name as --network="name"
on your pytorch execution.
For more information, take a look at the docker network manual.
来源:https://stackoverflow.com/questions/66022448/how-to-load-data-in-mongodb-running-in-host-from-inside-a-docker-running-on-the