PySpark error: AttributeError: 'NoneType' object has no attribute '_jvm'

前端未结

关注

 5  1987

既然无缘

I have timestamp dataset which is in format of

And I have written a udf in pyspark to process this dataset and return as Map of key values. But am getting below err

相关标签:

5条回答

轻奢々

2020-12-06 10:52

The error message says that in 27th line of udf you are calling some pyspark sql functions. It is line with abs() so I suppose that somewhere above you call from pyspark.sql.functions import * and it overrides python's abs() function.

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2020-12-06 10:52
Just to be clear the problem a lot of guys are having is stemming from a single bad programming style. That is from blah import *

When you guys do
```
from pyspark.sql.functions import *
```
you overwrite a lot of python builtins functions. I strongly recommending importing functions like
```
import pyspark.sql.functions as f
# or 
import pyspark.sql.functions as pyf
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

花落未央

2020-12-06 10:53

Make sure that you are initializing the Spark context. For example:

spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("...") \
    .getOrCreate()
sqlContext = SQLContext(spark)
productData = sqlContext.read.format("com.mongodb.spark.sql").load()

Or as in

spark = SparkSession.builder.appName('company').getOrCreate()
sqlContext = SQLContext(spark)
productData = sqlContext.read.format("csv").option("delimiter", ",") \
    .option("quote", "\"").option("escape", "\"") \
    .option("header", "true").option("inferSchema", "true") \
    .load("/path/thecsv.csv")

0 讨论(0)

一整个雨季

2020-12-06 10:59

Mariusz answer didn't really help me. So if you like me found this because it's the only result on google and you're new to pyspark (and spark in general), here's what worked for me.

In my case I was getting that error because I was trying to execute pyspark code before the pyspark environment had been set up.

Making sure that pyspark was available and set up before doing calls dependent on pyspark.sql.functions fixed the issue for me.

0 讨论(0)
发布评论:

提交评论
- 加载中...

傲寒

2020-12-06 11:10

This exception also arises when the udf can not handle None values. For example the following code results in the same exception:

get_datetime = udf(lambda ts: to_timestamp(ts), DateType())
df = df.withColumn("datetime", get_datetime("ts"))

However this one does not:

get_datetime = udf(lambda ts: to_timestamp(ts) if ts is not None else None, DateType())
df = df.withColumn("datetime", get_datetime("ts"))

0 讨论(0)