I was trying to print total elements in each partitions in a DataFrame using spark 2.2
from pyspark.sql.function
This is a great example of why you shouldn't use import *.
The line
from pyspark.sql.functions import *
will bring in all the functions in the pyspark.sql.functions
module into your namespace, include some that will shadow your builtins.
The specific issue is in the count_elements
function on the line:
n = sum(1 for _ in iterator)
# ^^^ - this is now pyspark.sql.functions.sum
You intended to call __builtin__.sum
, but the import *
shadowed the builtin.
Instead, do one of the following:
import pyspark.sql.functions as f
Or
from pyspark.sql.functions import sum as sum_