While building a generic evaluation tool, I came upon the following problem, where the cross_val_score.mean() gives slightly different results than cross_val_predict.
For