问题
I am trying to pivot a column which has more than 10000 distinct values. The default limit in Spark for maximum number of distinct values is 10000 and I am receiving this error
The pivot column
COLUMN_NUM_2
has more than 10000 distinct values, this could indicate an error. If this was intended, set spark.sql.pivotMaxValues to at least the number of distinct values of the pivot column
How do I set this in PySpark?
回答1:
You have to add / set this parameter in the Spark interpreter.
I am working with Zeppelin notebooks on an EMR (AWS) cluster, had the same error message as you and it worked after I added the parameter in the interpreter.
Hope this helps...
来源:https://stackoverflow.com/questions/42944612/how-to-set-pivotmaxvalues-in-pyspark