How to solve SPARK-5063 in nested map functions

前端 未结 3 671
自闭症患者
自闭症患者 2021-01-12 12:10

RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is

3条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-12 13:00

    In the same way nested operations on RDDs are not supported, nested RDD types are not possible in Spark. RDDs are only defined at the driver where, in combination with their SparkContext they can schedule operations on the data they represent.

    So, the root cause we need to address in this case is the datatype:

    JavaPairRDD> filesWithWords
    

    Which in Spark will have no possible valid use. Depending on the usecase, which is not further explained in the question, this type should become one of:

    A collection of RDDs, with the text file they refer to:

    Map>
    

    Or a collection of (textFile,Word) by text file:

    JavaPairRDD
    

    Or a collection of words with their corresponding TextFile:

    JavaPairRDD>
    

    Once the type is corrected, the issues with the nested RDD operations will be naturally solved.

提交回复
热议问题