How can I remove all \\
characters that are a part of strings from multiple columns in a spark DF?
Sample row:
11~ADX\\|0.00\\|ZZ\\|BP\\
Use foldLeft
on all columns in the dataframe, in this way you can use regexp_replace
on each separate column and return the final dataframe. Using the example dataframe in the question (called df
below), to remove all backslashes:
val df2 = df.columns.foldLeft(df)((df, c) => df.withColumn(c, regexp_replace(col(c), "\\\\", "")))
You could also escape all backslashes with the following:
val df2 = df.columns.foldLeft(df)((df, c) => df.withColumn(c, regexp_replace(col(c), "\\\\", "\\\\\\\\")))
If not all columns should be used, create a separate variable containing the columns to use. To use all all columns except one (column col
below) use:
val cols = df.columns diff List("col")
cols.foldLeft ...