How to return a case class when using Spark High Order Functions?

前端 未结 1 511
不知归路
不知归路 2021-01-21 14:37

I am trying to use Spark transform function in order to transform the items of an array from type ClassA into ClassB as shown below:



        
1条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-21 15:18

    The transform expression is relational and doesn't know anything about case classes ClassA and ClassB. The only way you have AFAIK would be to register an UDF so you can use your structure (or inject functions) but you would also have to deal with a "Row" encoded value instead of ClassA (SparkSQL is all about encoding :) ) like so :

    sparkSession.udf.register("toB", (a: Row) => ClassB(a.getAs[String]("a"), a.getAs[String]("b")))
    
    df.withColumn("ClassB", expr("transform(ClassA, c -> toB(c))")).show(false)
    

    Side note: Naming your column "ClassA" might be confusing since transform is reading the column, not the type.

    0 讨论(0)
提交回复
热议问题