What are the differences between slices and partitions of RDDs?

后端 未结 2 1025
星月不相逢
星月不相逢 2021-02-08 00:21

I am using Spark\'s Python API and running Spark 0.8.

I am storing a large RDD of floating point vectors and I need to perform calculations of one vector against the ent

2条回答
  •  深忆病人
    2021-02-08 01:01

    You can do partition as follows:

    import org.apache.spark.Partitioner
    
    val p = new Partitioner() {
      def numPartitions = 2
      def getPartition(key: Any) = key.asInstanceOf[Int]
    }
    recordRDD.partitionBy(p)
    

提交回复
热议问题