What are the differences between slices and partitions of RDDs?

后端未结

关注

 2  1025

星月不相逢 2021-02-08 00:21

I am using Spark\'s Python API and running Spark 0.8.

I am storing a large RDD of floating point vectors and I need to perform calculations of one vector against the ent

2条回答

深忆病人 (楼主)

2021-02-08 01:01

You can do partition as follows:

import org.apache.spark.Partitioner

val p = new Partitioner() {
  def numPartitions = 2
  def getPartition(key: Any) = key.asInstanceOf[Int]
}
recordRDD.partitionBy(p)

0 讨论(0)

查看其它2个回答