I have a DataFrame
with many columns of str
type, and I want to apply a function to all those columns, without renaming their names or adding more
Try something like this:
from pyspark.sql.functions import col, lower, trim
exprs = [
lower(trim(col(c))).alias(c) if t == "string" else col(c)
for (c, t) in df.dtypes
]
df.select(*exprs)
This approach has two main advantages over you current solution:
BatchPythonProcessing
).