pyspark udf print row being analyzed
问题 I have a problem inside a pyspark udf function and I want to print the number of the row generating the problem. I tried to count the rows using the equivalent of "static variable" in Python so that when the udf is called with a new row, a counter is incremented. However, it is not working: import pyspark.sql.functions as F def myF(input): myF.lineNumber += 1 if (somethingBad): print(myF.lineNumber) return res myF.lineNumber = 0 myF_udf = F.udf(myF, StringType()) How can I count the number of