Building a row from a dict in pySpark

前端 未结 2 890
小蘑菇
小蘑菇 2021-02-01 03:17

I\'m trying to dynamically build a row in pySpark 1.6.1, then build it into a dataframe. The general idea is to extend the results of describe to include, for exam

相关标签:
2条回答
  • 2021-02-01 04:07

    In case the dict is not flatten, you can convert dict to Row recursively.

    def as_row(obj):
        if isinstance(obj, dict):
            dictionary = {k: as_row(v) for k, v in obj.items()}
            return Row(**dictionary)
        elif isinstance(obj, list):
            return [as_row(v) for v in obj]
        else:
            return obj
    
    0 讨论(0)
  • 2021-02-01 04:09

    You can use keyword arguments unpacking as follows:

    Row(**row_dict)
    
    ## Row(C0=-1.1990072635132698, C3=0.12605772684660232, C4=0.5760856026559944, 
    ##     C5=0.1951877800894315, C6=24.72378589441825, summary='kurtosis')
    

    It is important to note that it internally sorts data by key to address problems with older Python versions.

    This behavior is likely to be removed in the upcoming releases - see SPARK-29748 Remove sorting of fields in PySpark SQL Row creation. Once it is remove you'll have to ensure that the order of values in the dict is consistent across records.

    0 讨论(0)
提交回复
热议问题