How to access Spark RDD Array of elements based on index

前端未结

关注

 1  1866

I have an RDD with Array of elements like below, each element can be treated as tuple, Now question is i want to access only 4th element from first two tuples.. and loop thr

相关标签:

1条回答

春和景丽

2020-12-07 04:46
How to access Spark RDD Array of elements based on index

The answer is simply don't try. RDDs are not indexed, and depending on a context order of values can be nondeterministic.

As far as I understand what you want is simply a map and sliding window:
```
import org.apache.spark.mllib.rdd.RDDFunctions._

// A dummy function
def doSomething(xs: Array[Int]) = xs match {
  case Array(x1, x2) => if (x1 <= x2) x1 else x2
}

val rdd = sc.parallelize(Array(
    (1, "Tom", "AAA", 2000),
    (2, "Tim", "AAA", 3000),
    (3, "Mark", "BBB", 6000),
    (4, "Jim", "BBB", 6000),
    (5, "James", "CCC", 4000)))

rdd.map(_._4).sliding(2).map(doSomething)
```
Above of course assumes that the order of values is defined or in other words ancestor lineage doesn't include shuffled RDDs.
0 讨论(0)
发布评论:

提交评论
- 加载中...