Using a column value as a parameter to a spark DataFrame function

前端 未结 1 1702
一整个雨季
一整个雨季 2020-11-22 04:59

Consider the following DataFrame:

#+------+---+
#|letter|rpt|
#+------+---+
#|     X|  3|
#|     Y|  1|
#|     Z         


        
相关标签:
1条回答
  • 2020-11-22 05:52

    One option is to use pyspark.sql.functions.expr, which allows you to use columns values as inputs to spark-sql functions.

    Based on @user8371915's comment I have found that the following works:

    from pyspark.sql.functions import expr
    
    df.select(
        '*',
        expr('posexplode(split(repeat(",", rpt), ","))').alias("index", "col")
    ).where('index > 0').drop("col").sort('letter', 'index').show()
    #+------+---+-----+
    #|letter|rpt|index|
    #+------+---+-----+
    #|     X|  3|    1|
    #|     X|  3|    2|
    #|     X|  3|    3|
    #|     Y|  1|    1|
    #|     Z|  2|    1|
    #|     Z|  2|    2|
    #+------+---+-----+
    
    0 讨论(0)
提交回复
热议问题