Creating a Pyspark Schema involving an ArrayType

两盒软妹~` 提交于 2020-02-19 10:33:12

问题


I'm trying to create a schema for my new DataFrame and have tried various combinations of brackets and keywords but have been unable to figure out how to make this work. My current attempt:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  ArrayType(StructType([
    StructField("user", StringType()),
    StructField("product", StringType()),
    StructField("rating", DoubleType())]))
  ])

Comes back with the error:

elementType should be DataType
Traceback (most recent call last):
 File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType   

I have googled, but so far no good examples of an array of objects.


回答1:


You will need an additional StructField for ArrayType property. This one should work:

from pyspark.sql.types import *

schema = StructType([
  StructField("User", IntegerType()),
  StructField("My_array", ArrayType(
      StructType([
          StructField("user", StringType()),
          StructField("product", StringType()),
          StructField("rating", DoubleType())
      ])
   )
])

For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/



来源:https://stackoverflow.com/questions/48394717/creating-a-pyspark-schema-involving-an-arraytype

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!