How to access values in array column?

后端 未结 4 747
眼角桃花
眼角桃花 2021-02-04 03:18

I have a Dataframe with one column. Each row of that column has an Array of String values:

Values in my Spark 2.2 Dataframe

[\"123\", \"abc\", \"2017\         


        
相关标签:
4条回答
  • 2021-02-04 04:10
     df.where($"col".getItem(2) === lit("2017")).select($"col".getItem(3))
    

    see getItem from https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

    0 讨论(0)
  • 2021-02-04 04:16

    Since Spark 2.4.0, there is a new function element_at($array_column, $index).

    See Spark docs

    0 讨论(0)
  • 2021-02-04 04:22

    What is the best way to access elements in the array?

    Accessing elements in an array column is by getItem operator.

    getItem(key: Any): Column An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.

    You could also use (ordinal) to access an element at ordinal position.

    val ds = Seq(
      Array("123", "abc", "2017", "ABC"),
      Array("456", "def", "2001", "ABC"),
      Array("789", "ghi", "2017", "DEF")).toDF("col")
    scala> ds.printSchema
    root
     |-- col: array (nullable = true)
     |    |-- element: string (containsNull = true)
    scala> ds.select($"col"(2)).show
    +------+
    |col[2]|
    +------+
    |  2017|
    |  2001|
    |  2017|
    +------+
    

    It's just a matter of personal choice and taste which approach suits you better, i.e. getItem or simply (ordinal).

    And in your case where / filter followed by select with distinct give the proper answer (as @Will did).

    0 讨论(0)
  • 2021-02-04 04:25

    you can do something like below

    import org.apache.spark.sql.functions._
    
    val ds = Seq(
     Array("123", "abc", "2017", "ABC"),
     Array("456", "def", "2001", "ABC"),
     Array("789", "ghi", "2017", "DEF")).toDF("col")
    
    ds.withColumn("col1",element_at('col,1))
    .withColumn("col2",element_at('col,2))
    .withColumn("col3",element_at('col,3))
    .withColumn("col4",element_at('col,4))
    .drop('col)
    .show()
    
    +----+----+----+----+
    |col1|col2|col3|col4|
    +----+----+----+----+
    | 123| abc|2017| ABC|
    | 456| def|2001| ABC|
    | 789| ghi|2017| DEF|
    +----+----+----+----+
    
    0 讨论(0)
提交回复
热议问题