Spark dataframe filter

前端 未结 4 1846
再見小時候
再見小時候 2021-02-07 06:54
val df = sc.parallelize(Seq((1,\"Emailab\"), (2,\"Phoneab\"), (3, \"Faxab\"),(4,\"Mail\"),(5,\"Other\"),(6,\"MSL12\"),(7,\"MSL\"),(8,\"HCP\"),(9,\"HCP12\"))).toDF(\"c1\"         


        
相关标签:
4条回答
  • 2021-02-07 07:08

    I used below to filter rows from dataframe and this worked form me.Spark 2.2

    val spark = new org.apache.spark.sql.SQLContext(sc)    
    val data = spark.read.format("csv").
              option("header", "true").
              option("delimiter", "|").
              option("inferSchema", "true").
              load("D:\\test.csv")   
    
    
    import  spark.implicits._
    val filter=data.filter($"dept" === "IT" )
    

    OR

    val filter=data.filter($"dept" =!= "IT" )
    
    0 讨论(0)
  • 2021-02-07 07:21

    This works too. Concise and very similar to SQL.

    df.filter("c2 not like 'MSL%' and c2 not like 'HCP%'").show
    +---+-------+
    | c1|     c2|
    +---+-------+
    |  1|Emailab|
    |  2|Phoneab|
    |  3|  Faxab|
    |  4|   Mail|
    |  5|  Other|
    +---+-------+
    
    0 讨论(0)
  • 2021-02-07 07:26

    val df1 = df.filter(not(df("c2").rlike("MSL"))&&not(df("c2").rlike("HCP")))

    This worked.

    0 讨论(0)
  • 2021-02-07 07:34
    df.filter(not(
        substring(col("c2"), 0, 3).isin("MSL", "HCP"))
        )
    
    0 讨论(0)
提交回复
热议问题