How to remove backslash from all columns in a Spark dataframe?

前端 未结 1 677
感情败类
感情败类 2021-01-20 05:01

How can I remove all \\ characters that are a part of strings from multiple columns in a spark DF?

Sample row:

11~ADX\\|0.00\\|ZZ\\|BP\\         


        
相关标签:
1条回答
  • 2021-01-20 05:10

    Use foldLeft on all columns in the dataframe, in this way you can use regexp_replace on each separate column and return the final dataframe. Using the example dataframe in the question (called df below), to remove all backslashes:

    val df2 = df.columns.foldLeft(df)((df, c) => df.withColumn(c, regexp_replace(col(c), "\\\\", "")))
    

    You could also escape all backslashes with the following:

    val df2 = df.columns.foldLeft(df)((df, c) => df.withColumn(c, regexp_replace(col(c), "\\\\", "\\\\\\\\")))
    

    If not all columns should be used, create a separate variable containing the columns to use. To use all all columns except one (column col below) use:

    val cols = df.columns diff List("col")
    cols.foldLeft ...
    
    0 讨论(0)
提交回复
热议问题