AttributeError: 'NoneType' object has no attribute 'setCallSite'

前端未结

关注

 3  549

In PySpark, I want to calculate the correlation between two dataframe vectors, using the following code (I do not have any problem in importing pyspark or createDataFrame):

相关标签:

3条回答

刺人心

2021-01-23 05:56
There are several reasons for getting that AttributeError:
1. You can use sc.stop before initializing one of xContext (where x could be SQL, Hive). For example:
```
sc = SparkContext.getOrCreate(conf = conf)  
sc.stop() 
spark = SQLContext(sc)  
```
2. Your spark is not synchronized on a cluster.
So, just restart your jupyter notebook kernel or reboot an application (not spark context) and it will work.
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2021-01-23 06:01

I got the same error not only with Correlation.corr(...) dataframe, but with ldaModel.describeTopics() as well.

Most probably it is the SPARK bug.

They forget to initialise DataFrame::_sc._jsc member when created resulting dataframe.

Each dataframe has normally this member initialised with proper JavaObject.

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2021-01-23 06:02
There's an ~~open~~ resolved issue around this:

https://issues.apache.org/jira/browse/SPARK-27335?jql=text%20~%20%22setcallsite%22

[Note: as it's resolved, if you're using a more recent version of Spark than October 2019, please report to Apache Jira if you're still encountering this issue]

The poster suggests forcing to sync your DF's backend with your Spark context:
```
df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
df._sc = spark._sc
```
This worked for us, hopefully can work in other cases as well.
0 讨论(0)
发布评论:

提交评论
- 加载中...