问题
I have a dataframe "x", In which their are two columns "x1" and "x2"
x1(status) x2
kv,true 45
bm,true 65
mp,true 75
kv,null 450
bm,null 550
mp,null 650
I want to convert this dataframe into a format in which data is filtered according to its status and value
x1 true null
kv 45 450
bm 65 550
mp 75 650
Is there a way to do this, I am using pyspark datadrame
回答1:
Yes, there is a way. First split the first column by ,
using split function, then split this dataframe into two dataframes (using where
twice) and simply join this new dataframes on first column..
In Spark API for Scala it'd be as follows:
val x1status = Seq(
("kv,true",45),
("bm,true",65),
("mp,true",75),
("kv,null",450),
("bm,null",550),
("mp,null",650)).toDF("x1", "x2")
val x1 = x1status
.withColumn("split", split('x1, ","))
.withColumn("x1", 'split getItem 0)
.withColumn("status", 'split getItem 1)
.drop("split")
scala> x1.show
+---+---+------+
| x1| x2|status|
+---+---+------+
| kv| 45| true|
| bm| 65| true|
| mp| 75| true|
| kv|450| null|
| bm|550| null|
| mp|650| null|
+---+---+------+
val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")
val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")
val result = trueDF.join(nullDF, "x1").drop("status")
scala> result.show
+---+----+----+
| x1|true|null|
+---+----+----+
| kv| 45| 450|
| bm| 65| 550|
| mp| 75| 650|
+---+----+----+
来源:https://stackoverflow.com/questions/40671827/how-to-transform-dataframe-per-one-column-to-create-two-new-columns-in-pyspark