Unpivot in spark-sql/pyspark

后端 未结 1 1891
星月不相逢
星月不相逢 2020-11-22 16:21

I have a problem statement at hand wherein I want to unpivot table in spark-sql/pyspark. I have gone through the documentation and I could see there is support only for pivo

相关标签:
1条回答
  • 2020-11-22 17:03

    You can use the built in stack function, for example in Scala:

    scala> val df = Seq(("G",Some(4),2,None),("H",None,4,Some(5))).toDF("A","X","Y", "Z")
    df: org.apache.spark.sql.DataFrame = [A: string, X: int ... 2 more fields]
    
    scala> df.show
    +---+----+---+----+
    |  A|   X|  Y|   Z|
    +---+----+---+----+
    |  G|   4|  2|null|
    |  H|null|  4|   5|
    +---+----+---+----+
    
    
    scala> df.select($"A", expr("stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)")).where("C is not null").show
    +---+---+---+
    |  A|  B|  C|
    +---+---+---+
    |  G|  X|  4|
    |  G|  Y|  2|
    |  H|  Y|  4|
    |  H|  Z|  5|
    +---+---+---+
    

    Or in pyspark:

    In [1]: df = spark.createDataFrame([("G",4,2,None),("H",None,4,5)],list("AXYZ"))
    
    In [2]: df.show()
    +---+----+---+----+
    |  A|   X|  Y|   Z|
    +---+----+---+----+
    |  G|   4|  2|null|
    |  H|null|  4|   5|
    +---+----+---+----+
    
    In [3]: df.selectExpr("A", "stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)").where("C is not null").show()
    +---+---+---+
    |  A|  B|  C|
    +---+---+---+
    |  G|  X|  4|
    |  G|  Y|  2|
    |  H|  Y|  4|
    |  H|  Z|  5|
    +---+---+---+
    
    0 讨论(0)
提交回复
热议问题