ModuleNotFoundError: No module named 'pyspark.dbutils'

夙愿已清 提交于 2020-06-17 09:59:11

问题


I am running pyspark from an Azure Machine Learning notebook. I am trying to move a file using the dbutil module.

from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    def get_dbutils(spark):
        try:
            from pyspark.dbutils import DBUtils
            dbutils = DBUtils(spark)
        except ImportError:
            import IPython
            dbutils = IPython.get_ipython().user_ns["dbutils"]
        return dbutils

    dbutils = get_dbutils(spark)
    dbutils.fs.cp("file:source", "dbfs:destination")

I got this error: ModuleNotFoundError: No module named 'pyspark.dbutils' Is there a workaround for this?

Here is the error in another Azure Machine Learning notebook:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-183f003402ff> in get_dbutils(spark)
      4         try:
----> 5             from pyspark.dbutils import DBUtils
      6             dbutils = DBUtils(spark)

ModuleNotFoundError: No module named 'pyspark.dbutils'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1-183f003402ff> in <module>
     10         return dbutils
     11 
---> 12 dbutils = get_dbutils(spark)

<ipython-input-1-183f003402ff> in get_dbutils(spark)
      7         except ImportError:
      8             import IPython
----> 9             dbutils = IPython.get_ipython().user_ns["dbutils"]
     10         return dbutils
     11 

KeyError: 'dbutils'

回答1:


This is a known issue with Databricks Utilities - DButils.

Most of DButils aren't supported for Databricks Connect. The only parts that do work are fs and secrets.

Reference: Databricks Connect - Limitations and Known issues.

Note: Currently fs and secrets work (locally). Widgets (!!!), libraries etc do not work. This shouldn’t be a major issue. If you execute on Databricks using the Python Task dbutils will fail with the error:

ImportError: No module named 'pyspark.dbutils'

I'm able to execute the query successfully by running as a notebook.



来源:https://stackoverflow.com/questions/61546680/modulenotfounderror-no-module-named-pyspark-dbutils

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!