Saving The RDD pair in particular format in the output file

大城市里の小女人 提交于 2019-12-10 11:33:10

问题


I have a JavaPairRDD lets say data of type

<Integer,List<Integer>>

when i do data.saveAsTextFile("output") The output will contain the data in the following format:

(1,[1,2,3,4])

etc...

I want something like this in the output file :

1 1,2,3,4

i.e. 1\t1,2,3,4

Any help would be appreciated


回答1:


You need to understand what's happening here. You have an RDD[T,U] where T and U are some obj types, read it as RDD of Tuple of T and U. On this RDD when you call saveAsTextFile(), it essentially converts each element of RDD to string, hence the text file is generated as output.

Now, how is an object of some type T converted to a string? By calling the toString() on it. This is the reason why you have [] representing the List, and () representing the Tuple as whole.

Solution, map each element in your RDD to a string as per your format. I'm not that familiar with the Java Syntax but with Scala I'll do something like,

rdd.map(e=>s"${e._1}\t${e._2.mkString(",")}")

Where mkString concatenates a collection using some delimiter.

Let me know if this helped. Cheers.



来源:https://stackoverflow.com/questions/45398795/saving-the-rdd-pair-in-particular-format-in-the-output-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!