I have a Dataframe with one column. Each row of that column has an Array of String values:
Values in my Spark 2.2 Dataframe
[\"123\", \"abc\", \"2017\
df.where($"col".getItem(2) === lit("2017")).select($"col".getItem(3))
see getItem
from https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
Since Spark 2.4.0, there is a new function element_at($array_column, $index)
.
See Spark docs
What is the best way to access elements in the array?
Accessing elements in an array column is by getItem operator.
getItem(key: Any): Column An expression that gets an item at position ordinal out of an array, or gets a value by key key in a
MapType
.
You could also use (ordinal)
to access an element at ordinal
position.
val ds = Seq(
Array("123", "abc", "2017", "ABC"),
Array("456", "def", "2001", "ABC"),
Array("789", "ghi", "2017", "DEF")).toDF("col")
scala> ds.printSchema
root
|-- col: array (nullable = true)
| |-- element: string (containsNull = true)
scala> ds.select($"col"(2)).show
+------+
|col[2]|
+------+
| 2017|
| 2001|
| 2017|
+------+
It's just a matter of personal choice and taste which approach suits you better, i.e. getItem
or simply (ordinal)
.
And in your case where
/ filter
followed by select
with distinct
give the proper answer (as @Will did).
you can do something like below
import org.apache.spark.sql.functions._
val ds = Seq(
Array("123", "abc", "2017", "ABC"),
Array("456", "def", "2001", "ABC"),
Array("789", "ghi", "2017", "DEF")).toDF("col")
ds.withColumn("col1",element_at('col,1))
.withColumn("col2",element_at('col,2))
.withColumn("col3",element_at('col,3))
.withColumn("col4",element_at('col,4))
.drop('col)
.show()
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| 123| abc|2017| ABC|
| 456| def|2001| ABC|
| 789| ghi|2017| DEF|
+----+----+----+----+