How to reformat the Spark Python Output

后端 未结 2 833
天命终不由人
天命终不由人 2021-01-19 04:22
(u\'142578\', (u\'The-North-side-9890\', (u\'   12457896\', 45.0)))
(u\'124578\', (u\'The-West-side-9091\', (u\'   14578217\', 0.0)))

This i got fr

相关标签:
2条回答
  • 2021-01-19 04:44

    Try this

    def rdd2string(t):
        def rdd2StringHelper(x):
            s = ''  
            if isinstance(x, collections.Iterable):
                for elem in x:
                    s = s+str(rdd2StringHelper(elem))
                return s
            else:
                return str(x)+','
    
        return rdd2StringHelper(t)[:-1]
    
    yourRDD.map(lambda x: rdd2string(x)).saveAsTextFile(...)
    

    This function works for all kind of tuples that can be formed by any combination of tuples (tuple2, tuple3,tuple21,etc) and lists (lists of lists, lists of tuples, list of ints,etc) and outputs a flat representation as a string in CSV format.

    It also answers your question from How to remove unwanted stuff like (),[], single quotes from PyPpark output [duplicate]

    EDIT

    Do not forget to add this import collections

    0 讨论(0)
  • 2021-01-19 04:52

    to get from this:

    (u'142578', (u'The-North-side-9890', (u' 12457896', 45.0)))

    to this:

    The-North-side-9890,12457896,45.0

    you need to use:

    result = result.map(lambda (k, (s, (n1, n2))): ','.join([s, str(int(n1)), str(float(n2))]))
    
    0 讨论(0)
提交回复
热议问题