PySpark: ModuleNotFoundError: No module named 'app'

前端 未结 1 902
情书的邮戳
情书的邮戳 2021-01-18 20:04

I am saving a dataframe to a CSV file in PySpark using below statement:

df_all.repartition(1).write.csv(\"xyz.csv\", header=True, mode=\'overwrite\')
         


        
1条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-18 20:53

    The error is very clear, there is not the module 'app'. Your Python code runs on driver, but you udf runs on executor PVM. When you call the udf, spark serializes the create_emi_amount to sent it to the executors.

    So, somewhere in your method create_emi_amount you use or import the app module. A solution to your problem is to use the same environment in both driver and executors. In spark-env.sh set the save Python virtualenv in PYSPARK_DRIVER_PYTHON=... and PYSPARK_PYTHON=....

    0 讨论(0)
提交回复
热议问题