Pyspark replace NaN with NULL

后端未结

关注

 2  466

时光取名叫无心 2021-01-05 08:30

I use Spark to perform data transformations that I load into Redshift. Redshift does not support NaN values, so I need to replace all occurrences of NaN with NULL.

2条回答

攒了一身酷 (楼主)

2021-01-05 09:19

df = spark.createDataFrame([(1, float('nan')), (None, 1.0)], ("a", "b"))
df.show()

+----+---+        
|   a|  b|
+----+---+
|   1|NaN|
|null|1.0|
+----+---+

df = df.replace(float('nan'), None)
df.show()

+----+----+
|   a|   b|
+----+----+
|   1|null|
|null| 1.0|
+----+----+

You can use the .replace function to change to null values in one line of code.

0 讨论(0)

查看其它2个回答