how to select all columns that starts with a common label

时光毁灭记忆、已成空白 提交于 2020-01-01 08:21:11

问题


I have a dataframe in Spark 1.6 and want to select just some columns out of it. The column names are like:

colA, colB, colC, colD, colE, colF-0, colF-1, colF-2

I know I can do like this to select specific columns:

df.select("colA", "colB", "colE")

but how to select, say "colA", "colB" and all the colF-* columns at once? Is there a way like in Pandas?


回答1:


First grab the column names with df.columns, then filter down to just the column names you want .filter(_.startsWith("colF")). This gives you an array of Strings. But the select takes select(String, String*). Luckily select for columns is select(Column*), so finally convert the Strings into Columns with .map(df(_)), and finally turn the Array of Columns into a var arg with : _*.

df.select(df.columns.filter(_.startsWith("colF")).map(df(_)) : _*).show

This filter could be made more complex (same as Pandas). It is however a rather ugly solution (IMO):

df.select(df.columns.filter(x => (x.equals("colA") || x.startsWith("colF"))).map(df(_)) : _*).show 

If the list of other columns is fixed you could also merge a fixed array of columns names with filtered array.

df.select((Array("colA", "colB") ++ df.columns.filter(_.startsWith("colF"))).map(df(_)) : _*).show


来源:https://stackoverflow.com/questions/35340390/how-to-select-all-columns-that-starts-with-a-common-label

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!