how to sort value before concatenate text columns in pyspark

问题

I need help to convert below code in Pyspark code or Pyspark sql code.

df["full_name"] = df.apply(lambda x: "_".join(sorted((x["first"], x["last"]))), axis=1)

Its basically adding one new column name full_name which have to concatenate values of the columns first and last in a sorted way.

I have done below code but don't know how to apply to sort in a columns text value.

df= df.withColumn('full_name', f.concat(f.col('first'),f.lit('_'), f.col('last')))

回答1:

From Spark-2.4+:

We can use array_join, array_sort functions for this case.

Example:

df.show()
#+-----+----+
#|first|last|
#+-----+----+
#|    a|   b|
#|    e|   c|
#|    d|   a|
#+-----+----+

from pyspark.sql.functions import *
#first we create array of first,last columns then apply sort and join on array
df.withColumn("full_name",array_join(array_sort(array(col("first"),col("last"))),"_")).show()
#+-----+----+---------+
#|first|last|full_name|
#+-----+----+---------+
#|    a|   b|      a_b|
#|    e|   c|      c_e|
#|    d|   a|      a_d|
#+-----+----+---------+

来源：https://stackoverflow.com/questions/60970602/how-to-sort-value-before-concatenate-text-columns-in-pyspark

标签

pandas

pyspark

pyspark-sql

pyspark-dataframes

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!