Access dependencies available in Scala but no PySpark

后端 未结 1 1523
再見小時候
再見小時候 2021-01-28 07:10

I am trying to access the dependencies of an RDD. In Scala it is a pretty simple code:

scala> val myRdd = sc.parallelize(0 to 9).groupBy(_ % 2)
myRdd: org.apa         


        
相关标签:
1条回答
  • 2021-01-28 07:31

    There is no supported way to do it, because it is not that meaningful. You can

    rdd = sc.parallelize([1, 2, 3]).map(lambda x: x)
    deps = sc._jvm.org.apache.spark.api.java.JavaRDD.toRDD(rdd._jrdd).dependencies()
    print(deps)
    ## List(org.apache.spark.OneToOneDependency@63b86b0d)
    
    for i in range(deps.size()):
        print(deps.apply(i))
    
    ## org.apache.spark.OneToOneDependency@63b86b0d
    

    but I don't think it will get you far.

    0 讨论(0)
提交回复
热议问题