How to use functions provide by DataFrameNaFunctions class in Spark, on a Dataframe?

前端 未结 1 1879
时光取名叫无心
时光取名叫无心 2021-02-08 19:08

I have a dataframe and I want to use one of the replace() function of org.apache.spark.sql.DataFrameNaFunctions on that dataframe

相关标签:
1条回答
  • 2021-02-08 20:02

    This can be a bit confusing but it's quite straightforward to be honest. Here is an small example :

    scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv")
    // df: org.apache.spark.sql.DataFrame = [name: string, age: int]
    
    scala> df.show()
    // +-----+----+
    // | name| age|
    // +-----+----+
    // |alice|  35|
    // |  bob|null|
    // |     |  24|
    // +-----+----+
    
    scala> df.na.fill(10.0,Seq("age"))
    // res4: org.apache.spark.sql.DataFrame = [name: string, age: int]
    
    // scala> df.na.fill(10.0,Seq("age")).show
    // +-----+---+
    // | name|age|
    // +-----+---+
    // |alice| 35|
    // |  bob| 10|
    // |     | 24|
    // +-----+---+
    
    scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show()
    // +-----+----+
    // | name| age|
    // +-----+----+
    // |alice|  61|
    // |  bob|null|
    // |     |  12|
    // +-----+----+
    

    To access org.apache.spark.sql.DataFrameNaFunctions you can call .na.

    0 讨论(0)
提交回复
热议问题