I have a dataframe df
:
val1 val2 val3
271 70 151
213 1 379
213 3 90
213 6 288
20 55 165
I wan
For numeric types you can use format_string
:
from pyspark.sql.functions import format_string
(sc.parallelize([(271, ), (20, ), (3, )])
.toDF(["val"])
.select(format_string("%03d", "val"))
.show())
+------------------------+
|format_string(%03d, val)|
+------------------------+
| 271|
| 020|
| 003|
+------------------------+
For strings lpad
:
from pyspark.sql.functions import lpad
(sc.parallelize([("271", ), ("20", ), ("3", )])
.toDF(["val"])
.select(lpad("val", 3, "0"))
.show())
+---------------+
|lpad(val, 3, 0)|
+---------------+
| 271|
| 020|
| 003|
+---------------+