问题
I have a spark dataframe and here is the schema:
|-- eid: long (nullable = true)
|-- age: long (nullable = true)
|-- sex: long (nullable = true)
|-- father: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: long (containsNull = true)
and a sample of rows:.
df.select(df['father']).show()
+--------------------+
| father|
+--------------------+
|[WrappedArray(-17...|
|[WrappedArray(-11...|
|[WrappedArray(13,...|
+--------------------+
and the type is
DataFrame[father: array<array<bigint>>]
How can I have access to each element of inner array? For example -17 in the first row?
I tried different things like df.select(df['father'])(0)(0).show()
but no luck.
回答1:
If I'm not mistaken, the syntax for in Python is
df.select(df['father'])[0][0].show()
or
df.select(df['father']).getItem(0).getItem(0).show()
See some examples here: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=column#pyspark.sql.Column
回答2:
The solution in scala should be as
import org.apache.spark.sql.functions._
val data = sparkContext.parallelize("""{"eid":1,"age":30,"sex":1,"father":[[1,2]]}""" :: Nil)
val dataframe = sqlContext.read.json(data).toDF()
the dataframe looks as
+---+---+---+--------------------+
|eid|age|sex|father |
+---+---+---+--------------------+
|1 |30 |1 |[WrappedArray(1, 2)]|
+---+---+---+--------------------+
the solution should be
dataframe.select(col("father")(0)(0) as("first"), col("father")(0)(1) as("second")).show(false)
output should be
+-----+------+
|first|second|
+-----+------+
|1 |2 |
+-----+------+
回答3:
Another scala answer would look like this:
df.select(col("father").getItem(0) as "father_0", col("father").getItem(1) as "father_1")
来源:https://stackoverflow.com/questions/44468311/access-to-wrappedarray-elements