Apply function to each row of Spark DataFrame

前端 未结 1 342
孤城傲影
孤城傲影 2021-01-18 01:20

I\'m on Spark 1.3.

I would like to apply a function to each row of a dataframe. This function hashes each column of the row and returns a list of the hashes.

1条回答
  •  迷失自我
    2021-01-18 02:22

    This isn't an instance of SPARK-5063 because you're not nesting RDD transformations; the inner .map() is being applied to a Scala Seq, not an RDD.

    My hunch is that some rows in your data set contain null column values, so some of the col.hashCode calls are throwing NullPointerExceptions when you try to evaluate null.hashCode. In order to work around this, you need to take nulls into account when computing hashcodes.

    If you're running on a Java 7 JVM or higher (source), you can do

    import java.util.Objects
    dataframe.map(row => row.toSeq.map(col => Objects.hashCode(col)))
    

    Alternatively, on earlier versions of Java you can do

        dataframe.map(row => row.toSeq.map(col => if (col == null) 0 else col.hashCode))
    

    0 讨论(0)
提交回复
热议问题