I am looking for a way to select columns of my dataframe in pyspark. For the first row, I know I can use df.first() but not sure about columns given that they do
df.first()
Use df.schema.names:
df.schema.names
spark.version # u'2.2.0' df = spark.createDataFrame([("foo", 1), ("bar", 2)]) df.show() # +---+---+ # | _1| _2| # +---+---+ # |foo| 1| # |bar| 2| # +---+---+ df.schema.names # ['_1', '_2'] for i in df.schema.names: # df_new = df.withColumn(i, [do-something]) print i # _1 # _2