When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'
.
But if I want to find the file in that direcotry, how d
As I said in my comment above, the documentation with examples can be found here. And quoting the description of the method saveAsTextFile
:
Save this RDD as a text file, using string representations of elements.
In the following example I save a simple RDD into a file, then I load it and print its content.
samples = sc.parallelize([
("abonsanto@fakemail.com", "Alberto", "Bonsanto"),
("mbonsanto@fakemail.com", "Miguel", "Bonsanto"),
("stranger@fakemail.com", "Stranger", "Weirdo"),
("dbonsanto@fakemail.com", "Dakota", "Bonsanto")
])
print samples.collect()
samples.saveAsTextFile("folder/here.txt")
read_rdd = sc.textFile("folder/here.txt")
read_rdd.collect()
The output will be
('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
('stranger@fakemail.com', 'Stranger', 'Weirdo')
('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')
[u"('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')",
u"('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')",
u"('stranger@fakemail.com', 'Stranger', 'Weirdo')",
u"('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')"]
Let's take a look using a Unix-based terminal.
usr@host:~/folder/here.txt$ cat *
('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
('stranger@fakemail.com', 'Stranger', 'Weirdo')
('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')