发表新帖

发表新帖

How to reformat the Spark Python Output

后端未结

关注

 2  835

天命终不由人

(u\'142578\', (u\'The-North-side-9890\', (u\'   12457896\', 45.0)))
(u\'124578\', (u\'The-West-side-9091\', (u\'   14578217\', 0.0)))

This i got fr

相关标签:

2条回答

抹茶落季

2021-01-19 04:44
Try this
```
def rdd2string(t):
    def rdd2StringHelper(x):
        s = ''  
        if isinstance(x, collections.Iterable):
            for elem in x:
                s = s+str(rdd2StringHelper(elem))
            return s
        else:
            return str(x)+','

    return rdd2StringHelper(t)[:-1]

yourRDD.map(lambda x: rdd2string(x)).saveAsTextFile(...)
```
This function works for all kind of tuples that can be formed by any combination of tuples (tuple2, tuple3,tuple21,etc) and lists (lists of lists, lists of tuples, list of ints,etc) and outputs a flat representation as a string in CSV format.

It also answers your question from How to remove unwanted stuff like (),[], single quotes from PyPpark output [duplicate]

EDIT

Do not forget to add this import collections
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2021-01-19 04:52
to get from this:

(u'142578', (u'The-North-side-9890', (u' 12457896', 45.0)))

to this:

The-North-side-9890,12457896,45.0

you need to use:
```
result = result.map(lambda (k, (s, (n1, n2))): ','.join([s, str(int(n1)), str(float(n2))]))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题