Filter spark DataFrame on string contains

后端 未结 2 641
你的背包
你的背包 2020-11-29 01:35

I am using Spark 1.3.0 and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well

val df = sqlContext.read         


        
相关标签:
2条回答
  • 2020-11-29 02:07

    In pyspark,SparkSql syntax:

    where column_n like 'xyz%'
    

    might not work.

    Use:

    where column_n RLIKE '^xyz' 
    

    This works perfectly fine.

    0 讨论(0)
  • 2020-11-29 02:17

    You can use contains (this works with an arbitrary sequence):

    df.filter($"foo".contains("bar"))
    

    like (SQL like with SQL simple regular expression whith _ matching an arbitrary character and % matching an arbitrary sequence):

    df.filter($"foo".like("bar"))
    

    or rlike (like with Java regular expressions):

    df.filter($"foo".rlike("bar"))
    

    depending on your requirements. LIKE and RLIKE should work with SQL expressions as well.

    0 讨论(0)
提交回复
热议问题