Query A Nested Array in Parquet Records

后端 未结 2 475
北荒
北荒 2021-01-15 16:54

I am trying different ways to query a record within a array of records and display complete Row as output.

I dont know which nested Object has String \"pg\"

2条回答
  •  借酒劲吻你
    2021-01-15 17:20

    It seems that you can use

    org.apache.spark.sql.functions.explode(e: Column): Column
    

    for example in my project(in java), i have nested json like this:

    {
        "error": [],
        "trajet": [
            {
                "something": "value"
            }
        ],
        "infos": [
            {
                "something": "value"
            }
        ],
        "timeseries": [
            {
                "something_0": "value_0",
                "something_1": "value_1",
                ...
                "something_n": "value_n"
            }
        ]
    }
    

    and i wanted to analyse datas in "timeseries", so i did:

    DataFrame ts = jsonDF.select(org.apache.spark.sql.functions.explode(jsonDF.col("timeseries")).as("t"))
                         .select("t.something_0",
                                 "t.something_1",
                                 ...
                                 "t.something_n");
    

    I'm new to spark too. Hope this could give you a hint.

提交回复
热议问题