Building a row from a dict in pySpark

前端未结

关注

 2  890

I\'m trying to dynamically build a row in pySpark 1.6.1, then build it into a dataframe. The general idea is to extend the results of describe to include, for exam

相关标签:

2条回答

庸人自扰

2021-02-01 04:07

In case the dict is not flatten, you can convert dict to Row recursively.

def as_row(obj):
    if isinstance(obj, dict):
        dictionary = {k: as_row(v) for k, v in obj.items()}
        return Row(**dictionary)
    elif isinstance(obj, list):
        return [as_row(v) for v in obj]
    else:
        return obj

0 讨论(0)

野性不改

2021-02-01 04:09
You can use keyword arguments unpacking as follows:
```
Row(**row_dict)

## Row(C0=-1.1990072635132698, C3=0.12605772684660232, C4=0.5760856026559944, 
##     C5=0.1951877800894315, C6=24.72378589441825, summary='kurtosis')
```
It is important to note that it internally sorts data by key to address problems with older Python versions.

This behavior is likely to be removed in the upcoming releases - see SPARK-29748 Remove sorting of fields in PySpark SQL Row creation. Once it is remove you'll have to ensure that the order of values in the dict is consistent across records.
0 讨论(0)
发布评论:

提交评论
- 加载中...