How to set pivotMaxValues in pyspark?

问题

I am trying to pivot a column which has more than 10000 distinct values. The default limit in Spark for maximum number of distinct values is 10000 and I am receiving this error

The pivot column COLUMN_NUM_2 has more than 10000 distinct values, this could indicate an error. If this was intended, set spark.sql.pivotMaxValues to at least the number of distinct values of the pivot column

How do I set this in PySpark?

回答1:

You have to add / set this parameter in the Spark interpreter.

I am working with Zeppelin notebooks on an EMR (AWS) cluster, had the same error message as you and it worked after I added the parameter in the interpreter.

Hope this helps...

来源：https://stackoverflow.com/questions/42944612/how-to-set-pivotmaxvalues-in-pyspark

标签

pyspark

pyspark-sql

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!