I am trying to access the dependencies of an RDD. In Scala it is a pretty simple code:
scala> val myRdd = sc.parallelize(0 to 9).groupBy(_ % 2)
myRdd: org.apa
There is no supported way to do it, because it is not that meaningful. You can
rdd = sc.parallelize([1, 2, 3]).map(lambda x: x)
deps = sc._jvm.org.apache.spark.api.java.JavaRDD.toRDD(rdd._jrdd).dependencies()
print(deps)
## List(org.apache.spark.OneToOneDependency@63b86b0d)
for i in range(deps.size()):
print(deps.apply(i))
## org.apache.spark.OneToOneDependency@63b86b0d
but I don't think it will get you far.