Defined before this block of code:
dataset
can be a Vector
or List
numberOfSlices
is an Int
The typical "optimal" partition calculates an exact fractional length after cutting and then rounds to find the actual number to take:
def cut[A](xs: Seq[A], n: Int):Vector[Seq[A]] = {
val m = xs.length
val targets = (0 to n).map{x => math.round((x.toDouble*m)/n).toInt}
def snip(xs: Seq[A], ns: Seq[Int], got: Vector[Seq[A]]): Vector[Seq[A]] = {
if (ns.length<2) got
else {
val (i,j) = (ns.head, ns.tail.head)
snip(xs.drop(j-i), ns.tail, got :+ xs.take(j-i))
}
}
snip(xs, targets, Vector.empty)
}
This way your longer and shorter blocks will be interspersed, which is often more desirable for evenness:
scala> cut(List(1,2,3,4,5,6,7,8,9,10),4)
res5: Vector[Seq[Int]] =
Vector(List(1, 2, 3), List(4, 5), List(6, 7, 8), List(9, 10))
You can even cut more times than you have elements:
scala> cut(List(1,2,3),5)
res6: Vector[Seq[Int]] =
Vector(List(1), List(), List(2), List(), List(3))