How to add suffix and prefix to all columns in python/pyspark dataframe

匆匆过客 提交于 2020-01-04 05:14:49

问题


I have a data frame in pyspark with more than 100 columns. What I want to do is for all the column names I would like to add back ticks(`) at the start of the column name and end of column name.

For example:

column name  is testing user. I want `testing user`

Is there a method to do this in pyspark/python. when we apply the code it should return a data frame.


回答1:


You can use withColumnRenamed method of dataframe in combination with na to create new dataframe

df.na.withColumnRenamed('testing user', '`testing user`')

edit : suppose you have list of columns, you can do like -

old = "First Last Age"
new = ["`"+field+"`" for field in old.split()]
df.rdd.toDF(new)

output :

DataFrame[`First`: string, `Last`: string, `Age`: string]



回答2:


Use list comprehension in python.

from pyspark.sql import functions as F

df = ...

df_new = df.select([F.col(c).alias("`"+c+"`") for c in df.columns])

This method also gives you the option to add custom python logic within the alias() function like: "prefix_"+c+"_suffix" if c in list_of_cols_to_change else c




回答3:


If you would like to add a prefix or suffix to multiple columns in a pyspark dataframe, you could use a for loop and .withColumnRenamed().

As an example, you might like:

def add_prefix(sdf, prefix):

      for c in sdf.columns:

          sdf = sdf.withColumnRenamed(c, '{}{}'.format(prefix, c))

      return sdf

You can amend sdf.columns as you see fit.




回答4:


I had a dataframe that I duplicated twice then joined together. Since both had the same columns names I used :

df = reduce(lambda df, idx: df.withColumnRenamed(list(df.schema.names)[idx],
                                                 list(df.schema.names)[idx] + '_prec'),
            range(len(list(df.schema.names))),
            df)

Every columns in my dataframe then had the '_prec' suffix which allowed me to do sweet stuff



来源:https://stackoverflow.com/questions/43160103/how-to-add-suffix-and-prefix-to-all-columns-in-python-pyspark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!