How to name file when saveAsTextFile in spark?

后端未结

关注

 3  1539

别跟我提以往 2021-02-20 17:45

When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'\').

But if I want to find the file in that direcotry, how d

3条回答

时光说笑 (楼主)

2021-02-20 18:47

As I said in my comment above, the documentation with examples can be found here. And quoting the description of the method saveAsTextFile:

Save this RDD as a text file, using string representations of elements.

In the following example I save a simple RDD into a file, then I load it and print its content.

samples = sc.parallelize([
    ("abonsanto@fakemail.com", "Alberto", "Bonsanto"),
    ("mbonsanto@fakemail.com", "Miguel", "Bonsanto"),
    ("stranger@fakemail.com", "Stranger", "Weirdo"),
    ("dbonsanto@fakemail.com", "Dakota", "Bonsanto")
])

print samples.collect()

samples.saveAsTextFile("folder/here.txt")
read_rdd = sc.textFile("folder/here.txt")

read_rdd.collect()

The output will be

('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
('stranger@fakemail.com', 'Stranger', 'Weirdo')
('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')

[u"('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')",
 u"('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')",
 u"('stranger@fakemail.com', 'Stranger', 'Weirdo')",
 u"('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')"]

Let's take a look using a Unix-based terminal.

usr@host:~/folder/here.txt$ cat *
('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
('stranger@fakemail.com', 'Stranger', 'Weirdo')
('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')

0 讨论(0)

查看其它3个回答