How to name file when saveAsTextFile in spark?

后端 未结 3 1539
别跟我提以往
别跟我提以往 2021-02-20 17:45

When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'\').

But if I want to find the file in that direcotry, how d

3条回答
  •  时光说笑
    2021-02-20 18:47

    As I said in my comment above, the documentation with examples can be found here. And quoting the description of the method saveAsTextFile:

    Save this RDD as a text file, using string representations of elements.

    In the following example I save a simple RDD into a file, then I load it and print its content.

    samples = sc.parallelize([
        ("abonsanto@fakemail.com", "Alberto", "Bonsanto"),
        ("mbonsanto@fakemail.com", "Miguel", "Bonsanto"),
        ("stranger@fakemail.com", "Stranger", "Weirdo"),
        ("dbonsanto@fakemail.com", "Dakota", "Bonsanto")
    ])
    
    print samples.collect()
    
    samples.saveAsTextFile("folder/here.txt")
    read_rdd = sc.textFile("folder/here.txt")
    
    read_rdd.collect()
    

    The output will be

    ('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
    ('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
    ('stranger@fakemail.com', 'Stranger', 'Weirdo')
    ('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')
    
    [u"('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')",
     u"('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')",
     u"('stranger@fakemail.com', 'Stranger', 'Weirdo')",
     u"('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')"]
    

    Let's take a look using a Unix-based terminal.

    usr@host:~/folder/here.txt$ cat *
    ('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
    ('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
    ('stranger@fakemail.com', 'Stranger', 'Weirdo')
    ('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')
    

提交回复
热议问题