How to drop multiple column names given in a list from Spark DataFrame?

耗尽温柔 提交于 2020-05-25 12:15:50

问题


I have a dynamic list which is created based on value of n.

n = 3
drop_lst = ['a' + str(i) for i in range(n)]
df.drop(drop_lst)

But the above is not working.

Note:

My use case requires a dynamic list.

If I just do the below without list it works

df.drop('a0','a1','a2')

How do I make drop function work with list?

Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using select()?


回答1:


You can use the * operator to pass the contents of your list as arguments to drop():

df.drop(*drop_lst)



回答2:


You can give column name as comma separated list e.g.

df.drop("col1","col11","col21")



回答3:


This is how drop specified number of consecutive columns in scala:

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice take two parameters star index and end index.




回答4:


You can use drop(*cols) 2 ways .

  1. df.drop('age').collect()
  2. df.drop(df.age).collect()

Check the official documentation DataFrame.drop



来源:https://stackoverflow.com/questions/47830915/how-to-drop-multiple-column-names-given-in-a-list-from-spark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!