I have a JavaPairRDD lets say data of type
<Integer,List<Integer>>
when i do data.saveAsTextFile("output") The output will contain the data in the following format:
(1,[1,2,3,4])
etc...
I want something like this in the output file :
1 1,2,3,4
i.e. 1\t1,2,3,4
Any help would be appreciated
You need to understand what's happening here. You have an RDD[T,U]
where T and U are some obj types, read it as RDD of Tuple of T and U. On this RDD when you call saveAsTextFile()
, it essentially converts each element of RDD to string, hence the text file is generated as output.
Now, how is an object of some type T converted to a string? By calling the toString() on it. This is the reason why you have [] representing the List, and () representing the Tuple as whole.
Solution, map each element in your RDD to a string as per your format. I'm not that familiar with the Java Syntax but with Scala I'll do something like,
rdd.map(e=>s"${e._1}\t${e._2.mkString(",")}")
Where mkString concatenates a collection using some delimiter.
Let me know if this helped. Cheers.
来源:https://stackoverflow.com/questions/45398795/saving-the-rdd-pair-in-particular-format-in-the-output-file