Shuffle using IComparer

↘锁芯ラ 提交于 2019-11-30 06:59:06
Juliet

I was somewhat surprised in this thread how many wrong answers were posted. Just for the sake of others who come up with a solution similar to the one posted by the OP, the following code looks correct:

int[] nums = new int[1000];
for (int i = 0; i < nums.Length; i++)
{
    nums[i] = i;
}

Random r = new Random();
Array.Sort<int>(nums, (x, y) => r.Next(-1, 2));

foreach(var num in nums)
{
    Console.Write("{0} ", num);
}

However, the code will throw an exception occasionally, but not always. That's what makes it fun to debug :) If you run it enough times, or execute the sort procedure in a loop 50 or so times, you'll get an error stating:

IComparer (or the IComparable methods it relies upon) did not return zero when Array.Sort called x. CompareTo(x). x: '0' x's type: 'Int32' The IComparer: ''.

In other words, the quick sort compared some number x to itself and got a non-zero result. The obvious solution to the code would be write:

Array.Sort<int>(nums, (x, y) =>
    {
        if (x == y) return 0;
        else return r.NextDouble() < 0.5 ? 1 : -1;
    });

But even this doesn't work, because there are occasions where .NET compares 3 numbers against one another which return inconsistent results, such as A > B, B > C, and C > A (oops!). No matter if you use a Guid, GetHashCode, or any other randomly generated input, a solution like the one shown above is still wrong.


With that being said, Fisher-Yates is the standard way of shuffling arrays, so there's no real reason to use IComparer in the first place. Fisher-Yates is O(n) whereas any implementation using IComparer uses a quicksort behinds the scenes which has a time-complexity of O(n log n). There's just no good reason not to use the well-known, efficient, standard algorithm to solve this kind of problem.

However, if you really insist on using an IComparer and a rand, then apply your random data before you sort. This requires a projection of the data onto another object so you don't lose your random data:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Pair<T, U>
    {
        public T Item1 { get; private set; }
        public U Item2 { get; private set; }
        public Pair(T item1, U item2)
        {
            this.Item1 = item1;
            this.Item2 = item2;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Pair<int, double>[] nums = new Pair<int, double>[1000];
            Random r = new Random();
            for (int i = 0; i < nums.Length; i++)
            {
                nums[i] = new Pair<int, double>(i, r.NextDouble());
            }

            Array.Sort<Pair<int, double>>(nums, (x, y) => x.Item2.CompareTo(y.Item2));

            foreach (var item in nums)
            {
                Console.Write("{0} ", item.Item1);
            }

            Console.ReadKey(true);
        }
    }
}

Or get LINQy with your bad self:

Random r = new Random();
var nums = from x in Enumerable.Range(0, 1000)
           orderby r.NextDouble()
           select x;

One suggestion I got elsewhere was to create a separate IArranger interface that describes a single operation to Arrange a collection. This can work where IComparer/IComparable cannot because it operates on an entire collection, instead of individual items. It might look something like this:

public interface IArranger<T>
{
    IEnumerable<T> Arrange(IEnumerable<T> items);
}

Then I could implement a Shuffle from the IArranger interface using a proper Fisher-Yates algorithm, and also have implementations that wrap each additional IEnumerable.Sort()/IComparable/IComparer varieties that I care about. That might look something like this:

public class ComparerArranger<T> : IArranger<T>
{
    private IComparer<T> comparer;

    public ComparableArranger(IComparer<T> comparer)
    {
        this.comparer = comparer;
    }

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i, comparer);
    }
}

or

//uses the default Comparer for the type (Comparer<T>.Default)
public class TypeArranger<T> : IArranger<T> 
{
    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i);
    }
}

or

public class ShuffleArranger<T> : IArranger<T>
{
    //naive implementation for demonstration
    // if I ever develop this more completely I would try to
    // avoid needing to call .ToArray() in here
    // and use a better prng
    private Random r = new Random();

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
        var values = items.ToArray();

        //valid Fisher-Yates shuffle on the values array
        for (int i = values.Length; i > 1; i--)
        {
            int j = r.Next(i);
            T tmp = values[j];
            values[j] = values[i - 1];
            values[i - 1] = tmp;
        }
        foreach (var item in values) yield return item;
    }
}

For a final step, I add support for this to any IEnumerable via an extension method. Then you still get the simple run-time algorithm swapping, you have a better implementation of the shuffle algorithm, and the code to use it feels natural:

public static IEnumerable<T> Arrange(this IEnumerable<T> items, IArranger<T> arranger)
{
    return arranger.Arrange(items);
}

IComparer requiring a zero return at some point (for equal instances of T), makes it mathematically impossible to create a generic IComparer that will mimic a Fisher-Yates Shuffle statistically. There will always be a bias. For a real shuffle, you'd never want to force it to return any particular value.

How 'bout sorting based on a hidden field, which is pre-assigned a random value?

To follow up on James Curran's idea: let the IComparer maintain the "sorted" values as a list; if a new value occurs, insert it into the list at a random position; compare by list index. Optimize by maintaining the list as a balanced tree or something. Every instance of such a IComparer will maintain a consistent and random sort order so you have the choice of letting your Random sorting be consistently the same random ordering or a different one each time. Minor modification will even allow equal elements to be "sorted" into different ordering positions, if you prefer to read "random" that way.

An interesting endeavor. Most likely a misuse/abuse of IComparer.

You're attempting to do a random weighted sort by using a mechanism that wasn't built for that purpose.

Why not implement your own sorting routine and your own comparer? I have a feeling that even that would be insufficient.

Don't do it.

All of the algorithms proposed thus far introduce some sort of bias into the output (some bigger than others).

@Princess and @Luke propose storing a random number alongside the data. However because there is a possibility that any two of these random numbers could have the same value as another, the sort order between those two items will be deterministically biased

The worst case for this would be if the sorting routine is "stable" (that is that objects that are considered equal are always output in the same order they were input in). Array.Sort doesn't happen to be stable (it uses QuickSort internally) but there is still a bias that occurs whenever two items have the same value that depends on where they are in the input (and specifically where they are relative to the QuickSort's pivot).

As the keyspace for this random number increases, the probability of a collision goes down (with a good source of randomness), but keep in mind that as the number of values you are sorting goes up, the birthday paradox dictates that the likelyhood of at least one pair amongst them colliding goes up very quickly.

For an integer key, there are 2^32 unique values for the key and even assuming that there is a perfectly even distribution of random values, with 75,000 rows, there is a 50% probability that there will be a collision. Wikipedia.

The cryptographic hash approach that you proposed potentially has a large enough keyspace (160) bits to make the chance of a collision negligible, but your algorithm decomposes all of that randomness back down to a single int before actually doing the compare which negates the benefit of that larger keyspace.

Your best approach is to associate a distinct "sortOrder" value with each of your data items shuffle these values using a proven algorithm, and then order the results by that value.

If you are using Array.Sort, there is an overload that takes an array of "keys" and an array of "values". The keys array is sorted normally, but whenever a value in the keys array is moved, the corresponding entry in the values array is also moved.

Something like:


Something[] data;//populated somewhere
int[] keys = new int[data.Length];//or long if you might have lots of data
for(int i=0;i<keys.Length;++i) {
 keys[i] = i;
}

Shuffle(keys);

Array.Sort(keys, data);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!