Apply filter condition on dataframe created from JSON

后端 未结 2 722
鱼传尺愫
鱼传尺愫 2021-01-23 14:02

I am working on the dataframe created by JSON and then I want to apply the filter condition over the dataframe.

val jsonStr = \"\"\"{ \"metadata\": [{ \"key\": 8         


        
相关标签:
2条回答
  • 2021-01-23 14:28

    First you should use explode to get an easy-to-work-with dataFrame. Then you can select both key and value of you given input:

    val explodedDF = df.withColumn("metadata", explode($"metadata"))
      .select("metadata.key", "metadata.value")
    

    Output:

    +-----+-----+
    |  key|value|
    +-----+-----+
    |84896|   54|
    | 1234|   12|
    +-----+-----+
    

    This way you'll be able to perform your filtering logic as usual:

    scala> explodedDF.where("key == 84896").show
    +-----+-----+
    |  key|value|
    +-----+-----+
    |84896|   54|
    +-----+-----+
    

    You can concatenate your filtering requirements, some examples below:

    explodedDF.where("key == 84896 AND value == 54")
    explodedDF.where("(key == 84896 AND value == 54) OR key = 1234")
    
    0 讨论(0)
  • 2021-01-23 14:32

    From what I have understood from your question and comment is that you are trying to apply ( (key == 999, value == 55) || (key == 1234, value == 12) ) expression to filter the dataframe rows.

    First of all, the expression needs changes as it cannot be applied as expression to dataframe in spark so you need to change as

    val expression = """( (key == 999, value == 55) || (key == 1234, value == 12) )"""
    val actualExpression = expression.replace(",", " and").replace("||", "or")
    

    which should give you new valid expression as

    ( (key == 999 and value == 55) or (key == 1234 and value == 12) )
    

    Now that you have valid expression, your dataframe needs modification too as you can't query such expression on a column with array and struct as schema

    So you would need explode function to explode the array elements to different rows and then use .* notation to select all the elements of struct on different columns.

    val df1 = df.withColumn("metadata", explode($"metadata"))
      .select($"metadata.*")
    

    which should give you dataframe as

    +-----+-----+
    |key  |value|
    +-----+-----+
    |84896|54   |
    |1234 |12   |
    +-----+-----+
    

    And the finally use the valid expression on the dataframe generated as

    df1.where(s"${actualExpression}")
    

    I hope the answer is helpful

    0 讨论(0)
提交回复
热议问题