How to reformat the Spark Python Output

后端 未结 2 832
天命终不由人
天命终不由人 2021-01-19 04:22
(u\'142578\', (u\'The-North-side-9890\', (u\'   12457896\', 45.0)))
(u\'124578\', (u\'The-West-side-9091\', (u\'   14578217\', 0.0)))

This i got fr

2条回答
  •  抹茶落季
    2021-01-19 04:44

    Try this

    def rdd2string(t):
        def rdd2StringHelper(x):
            s = ''  
            if isinstance(x, collections.Iterable):
                for elem in x:
                    s = s+str(rdd2StringHelper(elem))
                return s
            else:
                return str(x)+','
    
        return rdd2StringHelper(t)[:-1]
    
    yourRDD.map(lambda x: rdd2string(x)).saveAsTextFile(...)
    

    This function works for all kind of tuples that can be formed by any combination of tuples (tuple2, tuple3,tuple21,etc) and lists (lists of lists, lists of tuples, list of ints,etc) and outputs a flat representation as a string in CSV format.

    It also answers your question from How to remove unwanted stuff like (),[], single quotes from PyPpark output [duplicate]

    EDIT

    Do not forget to add this import collections

提交回复
热议问题