pymongo bulk write perform very slow

孤者浪人 提交于 2021-02-11 17:11:28

问题


We have a dataframe of almost 100000 records which i want to upsert in a mongodb collection.

My sample code is mentioned below.

For keeping it simple in below code, I am generating these data in a for loop and appending lstValues.

In actual application, we receive these data from external csv files which we load it into pandas dataframe.

We receive almost 98000 records from these external csv files. Also our original mongodb collection already contains almost 1,00,00,00 records and it keeps on increasing.

below i have just used few fields like studid, name, Grade, Address, Phone and Std. But in real application we have almost 200 such fields.

You can see i am using bulk_write function to batch update my collection. Also i am trying to make bathc size as 1000 records. But still, below code takes almost 20 minutes or more to upsert these records. We have an external application which is doing same this in almost 4 minutes. the goad here is to prove Python's capabilities to perform these type of batch operations with MongoDB. Not sure if am doing something wrong in below code ? or Is that the max that python can perform with such huge dataset ?

Please advice, how can i improve performance of my below code or any alternative to achive this within Python ?

from pymongo import MongoClient, ReplaceOne, InsertOne,DeleteOne
import pandas as pd
import time
import uuid

lstValues = []
for i in range(100000):    
template = {'StudId': str(uuid.uuid1()) , 'Name':'xyz' + str(i), 'Grade':'A', 'Address':'abc', 'Phone':'0123', 'Std':'M1'}
lstValues.append(template)

bulklist = []

db = MongoClient(['server1:27017', 'server2:27018'],replicaset='rs_development',username='appadmin',password='abcxyz',authSource='admin',authMechanism='SCRAM-SHA-1')['TestDB']    

starttime = time.time()
for m in lstValues:            
   bulklist.append(ReplaceOne(
            { "STUDENT.Grade": m['Grade'] , "STUDENT.Name": m['Name'] },
            {'STUDENT': m },
            upsert=True
        ))
   if (len(bulklist) == 1000):                
      db.AnalyticsTestBRS.bulk_write(bulklist, ordered=False)
      bulklist=[]

print("Time taken mongo upsert : {0} seconds".format((time.time() - starttime)))

来源:https://stackoverflow.com/questions/59931259/pymongo-bulk-write-perform-very-slow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!