问题
Pyspark version: 2.4.4 MongoDB version: 4.2.0 RAM: 64GB CPU Core:32 running script: spark-submit --executor-memory 8G --driver-memory 8G --packages org.mongodb.spark:mongo-spark-connector_2.11:2.3.1 demographic.py
when I run the code I am getting the error: "py4j.protocol.Py4JJavaError: An error occurred while calling o764.save. : com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=172...*:27017, type=REPLICA_SET_SECONDARY, roundTripTime=34.3 ms, state=CONNECTED}]"
I am trying to read a MongoDB collection from one replica server which has authentication and I can read from that server using:
df_ipapp = spark.read.format('com.mongodb.spark.sql.DefaultSource').option('uri', '{}/{}.IpAppointment?authSource={}'.format(mongo_url, mongo_db,auth_source)).load()
and it's working fine. but after processing this data frame I am writing that data frame to another MongoDB which has no authentication that is situated locally where I process, using: df.write.format('com.mongodb.spark.sql.DefaultSource').mode('overwrite').option('uri', '{}/{}.demographic'.format(mongo_final_url, mongo_final_db)).save()
and every time I get error here
File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/demographic.py", line 297, in save_n_rename
.option('uri', '{}/{}.demographic'.format(mongo_url, mongo_final_db)).save()
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 736, in save
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o788.save.
: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=172.*.*.*:27017, type=REPLICA_SET_SECONDARY, roundTripTime=0.8 ms, state=CONNECTED}]
reading from replica server:
df_bills = spark.read.format('com.mongodb.spark.sql.DefaultSource').option('uri', '{}/{}.Bills?authSource={}'.format(mongo_url, mongo_db, auth_source)).load()
writing to mongodb:
df.write.format('com.mongodb.spark.sql.DefaultSource').mode('overwrite').option('uri', '{}/{}.demographic'.format(mongo_final_url, mongo_final_db)).save()
I want to read from a replica server MondoDb which has authentication and process the data frame and write it to the local MongoDB thanks in advance
来源:https://stackoverflow.com/questions/58624903/py4j-protocol-py4jjavaerror-an-error-occurred-while-calling-o788-save-com-mo