Spark HashPartitioner Unexpected Partitioning

后端 未结 1 1105
遇见更好的自我
遇见更好的自我 2021-01-22 02:15

I am using HashPartioner but getting an unexpected result. I am using 3 different String as keys, and giving partition parameter as 3, so I expect 3 partitions.

相关标签:
1条回答
  • 2021-01-22 02:33

    There is nothing strange going on here. Utils.nonNegativeMod, which is used by HashPartitioner is implemented as follows:

    def nonNegativeMod(x: Int, mod: Int): Int = {
      val rawMod = x % mod
      rawMod + (if (rawMod < 0) mod else 0)
    }
    

    With 3 partitions the key distribution is defined as shown below:

    for { car <- Seq("Honda", "Toyota", "Kia") } 
      yield (car -> nonNegativeMod(car.hashCode, 3))
    
    Seq[(String, Int)] = List((Honda,1), (Toyota,0), (Kia,1))
    

    which is exactly what you get in your case. In other words, lack of direct hash collision doesn't guarantee lack of collision modulo an arbitrary number.

    0 讨论(0)
提交回复
热议问题