发表新帖

发表新帖

How to reformat the Spark Python Output

后端未结

关注

 2  832

天命终不由人 2021-01-19 04:22

(u\'142578\', (u\'The-North-side-9890\', (u\'   12457896\', 45.0)))
(u\'124578\', (u\'The-West-side-9091\', (u\'   14578217\', 0.0)))

This i got fr

2条回答

抹茶落季 (楼主)

2021-01-19 04:44
Try this
```
def rdd2string(t):
    def rdd2StringHelper(x):
        s = ''  
        if isinstance(x, collections.Iterable):
            for elem in x:
                s = s+str(rdd2StringHelper(elem))
            return s
        else:
            return str(x)+','

    return rdd2StringHelper(t)[:-1]

yourRDD.map(lambda x: rdd2string(x)).saveAsTextFile(...)
```
This function works for all kind of tuples that can be formed by any combination of tuples (tuple2, tuple3,tuple21,etc) and lists (lists of lists, lists of tuples, list of ints,etc) and outputs a flat representation as a string in CSV format.

It also answers your question from How to remove unwanted stuff like (),[], single quotes from PyPpark output [duplicate]

EDIT

Do not forget to add this import collections
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题