How do `map` and `reduce` methods work in Spark RDDs?

前端 未结 3 709
面向向阳花
面向向阳花 2021-01-31 05:01

Following code is from the quick start guide of Apache Spark. Can somebody explain me what is the \"line\" variable and where it comes from?

textFile.map(line =         


        
3条回答
  •  借酒劲吻你
    2021-01-31 05:50

    what map function does is, it takes the list of arguments and map it to some function. Similar to map function in python, if you are familiar.

    Also, File is like a list of Strings. (not exactly but that's how it's being iterated)

    Let's consider this is your file.

    val list_a: List[String] = List("first line", "second line", "last line")
    

    Now let's see how map function works.

    We need two things, list of values which we already have and function to which we want to map this values. let's consider really simple function for understanding.

    val myprint = (arg:String)=>println(arg)
    

    this function simply takes single String argument and prints on the console.

    myprint("hello world")
    hello world
    

    if we match this function to your list, it's gonna print all the lines

    list_a.map(myprint)
    

    We can write an anonymous function as mentioned below as well, which does the same thing.

    list_a.map(arg=>println(arg))
    

    in your case, line is the first line of the file. you could change the argument name as you like. for example, in above example, if I change arg to line it would work without any issue

    list_a.map(line=>println(line))
    

提交回复
热议问题