PySpark error: AttributeError: 'NoneType' object has no attribute '_jvm'

前端 未结 5 1987
既然无缘
既然无缘 2020-12-06 10:33

I have timestamp dataset which is in format of

And I have written a udf in pyspark to process this dataset and return as Map of key values. But am getting below err

相关标签:
5条回答
  • 2020-12-06 10:52

    The error message says that in 27th line of udf you are calling some pyspark sql functions. It is line with abs() so I suppose that somewhere above you call from pyspark.sql.functions import * and it overrides python's abs() function.

    0 讨论(0)
  • 2020-12-06 10:52

    Just to be clear the problem a lot of guys are having is stemming from a single bad programming style. That is from blah import *

    When you guys do

    from pyspark.sql.functions import *
    

    you overwrite a lot of python builtins functions. I strongly recommending importing functions like

    import pyspark.sql.functions as f
    # or 
    import pyspark.sql.functions as pyf
    
    0 讨论(0)
  • 2020-12-06 10:53

    Make sure that you are initializing the Spark context. For example:

    spark = SparkSession \
        .builder \
        .appName("myApp") \
        .config("...") \
        .getOrCreate()
    sqlContext = SQLContext(spark)
    productData = sqlContext.read.format("com.mongodb.spark.sql").load()
    

    Or as in

    spark = SparkSession.builder.appName('company').getOrCreate()
    sqlContext = SQLContext(spark)
    productData = sqlContext.read.format("csv").option("delimiter", ",") \
        .option("quote", "\"").option("escape", "\"") \
        .option("header", "true").option("inferSchema", "true") \
        .load("/path/thecsv.csv")
    
    0 讨论(0)
  • 2020-12-06 10:59

    Mariusz answer didn't really help me. So if you like me found this because it's the only result on google and you're new to pyspark (and spark in general), here's what worked for me.

    In my case I was getting that error because I was trying to execute pyspark code before the pyspark environment had been set up.

    Making sure that pyspark was available and set up before doing calls dependent on pyspark.sql.functions fixed the issue for me.

    0 讨论(0)
  • 2020-12-06 11:10

    This exception also arises when the udf can not handle None values. For example the following code results in the same exception:

    get_datetime = udf(lambda ts: to_timestamp(ts), DateType())
    df = df.withColumn("datetime", get_datetime("ts"))
    

    However this one does not:

    get_datetime = udf(lambda ts: to_timestamp(ts) if ts is not None else None, DateType())
    df = df.withColumn("datetime", get_datetime("ts"))
    
    0 讨论(0)
提交回复
热议问题