I have a dataframe and I want to use one of the replace() function of
org.apache.spark.sql.DataFrameNaFunctions
on that dataframe
This can be a bit confusing but it's quite straightforward to be honest. Here is an small example :
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv")
// df: org.apache.spark.sql.DataFrame = [name: string, age: int]
scala> df.show()
// +-----+----+
// | name| age|
// +-----+----+
// |alice| 35|
// | bob|null|
// | | 24|
// +-----+----+
scala> df.na.fill(10.0,Seq("age"))
// res4: org.apache.spark.sql.DataFrame = [name: string, age: int]
// scala> df.na.fill(10.0,Seq("age")).show
// +-----+---+
// | name|age|
// +-----+---+
// |alice| 35|
// | bob| 10|
// | | 24|
// +-----+---+
scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show()
// +-----+----+
// | name| age|
// +-----+----+
// |alice| 61|
// | bob|null|
// | | 12|
// +-----+----+
To access org.apache.spark.sql.DataFrameNaFunctions
you can call .na.