Dropping multiple columns from Spark dataframe by Iterating through the columns from a Scala List of Column names

后端未结

关注

 4  1811

I have a dataframe which has columns around 400, I want to drop 100 columns as per my requirement. So i have created a Scala List of 100 column names. And then i want to ite

相关标签:

4条回答

不知归路

2020-12-29 01:21

Answer:

val colsToRemove = Seq("colA", "colB", "colC", etc) 

val filteredDF = df.select(df.columns .filter(colName => !colsToRemove.contains(colName)) .map(colName => new Column(colName)): _*)

0 讨论(0)

有刺的猬

2020-12-29 01:25
If you just want to do nothing more complex than dropping several named columns, as opposed to selecting them by a particular condition, you can simply do the following:
```
df.drop("colA", "colB", "colC")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-12-29 01:26
This should work fine :
```
val dropList : List[String]  |
val df : DataFrame  |
val test_df = df.drop(dropList : _*) 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Happy的楠姐

2020-12-29 01:27

You can just do,

def dropColumns(inputDF: DataFrame, dropList: List[String]): DataFrame = 
    dropList.foldLeft(inputDF)((df, col) => df.drop(col))

It will return you the DataFrame without the columns passed in dropList.

As an example (of what's happening behind the scene), let me put it this way.

scala> val list = List(0, 1, 2, 3, 4, 5, 6, 7)
list: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7)

scala> val removeThese = List(0, 2, 3)
removeThese: List[Int] = List(0, 2, 3)

scala> removeThese.foldLeft(list)((l, r) => l.filterNot(_ == r))
res2: List[Int] = List(1, 4, 5, 6, 7)

The returned list (in our case, map it to your DataFrame) is the latest filtered. After each fold, the latest is passed to the next function (_, _) => _.

0 讨论(0)