(u\'142578\', (u\'The-North-side-9890\', (u\' 12457896\', 45.0)))
(u\'124578\', (u\'The-West-side-9091\', (u\' 14578217\', 0.0)))
This i got fr
Try this
def rdd2string(t):
def rdd2StringHelper(x):
s = ''
if isinstance(x, collections.Iterable):
for elem in x:
s = s+str(rdd2StringHelper(elem))
return s
else:
return str(x)+','
return rdd2StringHelper(t)[:-1]
yourRDD.map(lambda x: rdd2string(x)).saveAsTextFile(...)
This function works for all kind of tuples that can be formed by any combination of tuples (tuple2, tuple3,tuple21,etc) and lists (lists of lists, lists of tuples, list of ints,etc) and outputs a flat representation as a string in CSV format.
It also answers your question from How to remove unwanted stuff like (),[], single quotes from PyPpark output [duplicate]
EDIT
Do not forget to add this import collections
to get from this:
(u'142578', (u'The-North-side-9890', (u' 12457896', 45.0)))
to this:
The-North-side-9890,12457896,45.0
you need to use:
result = result.map(lambda (k, (s, (n1, n2))): ','.join([s, str(int(n1)), str(float(n2))]))