I have read an article about various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:
var r = ne
Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.
One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.
I would say that many answers here like "This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values" might be very wrong!
I'd think that this DOES NOT assign a random value to each element of the source collection. Instead there might be a sort algorithm running like Quicksort which would call a compare-function approximately n log n times. Some sort algortihm really expect this compare-function to be stable and always return the same result!
Couldn't it be that the IEnumerableSorter calls a compare function for each algorithm step of e.g. quicksort and each time calls the function x => r.Next()
for both parameters without caching these!
In that case you might really mess up the sort algorithm and make it much worse than the expectations the algorithm is build up on. Of course, it eventually will become stable and return something.
I might check it later by putting debugging output inside a new "Next" function so see what happens. In Reflector I could not immediately find out how it works.
This is based on Jon Skeet's answer.
In that answer, the array is shuffled, then returned using yield
. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.
This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield
the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray()
it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3)
and only three swaps will take place (although the entire array would have to be copied first).
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length - 1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
// we don't actually perform the swap, we can forget about the
// swapped element because we already returned it.
}
// there is one item remaining that was not returned - we return it now
yield return elements[0];
}
Starting from this quote of Skeet:
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I'll go on a little explaining the reason for the hopefully unique!
Now, from the Enumerable.OrderBy:
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved
This is very important! What happens if two elements "receive" the same random number? It happens that they remain in the same order they are in the array. Now, what is the possibility for this to happen? It is difficult to calculate exactly, but there is the Birthday Problem that is exactly this problem.
Now, is it real? Is it true?
As always, when in doubt, write some lines of program: http://pastebin.com/5CDnUxPG
This little block of code shuffles an array of 3 elements a certain number of times using the Fisher-Yates algorithm done backward, the Fisher-Yates algorithm done forward (in the wiki page there are two pseudo-code algorithms... They produce equivalent results, but one is done from first to last element, while the other is done from last to first element), the naive wrong algorithm of http://blog.codinghorror.com/the-danger-of-naivete/ and using the .OrderBy(x => r.Next())
and the .OrderBy(x => r.Next(someValue))
.
Now, Random.Next is
A 32-bit signed integer that is greater than or equal to 0 and less than MaxValue.
so it's equivalent to
OrderBy(x => r.Next(int.MaxValue))
To test if this problem exists, we could enlarge the array (something very slow) or simply reduce the maximum value of the random number generator (int.MaxValue
isn't a "special" number... It is simply a very big number). In the end, if the algorithm isn't biased by the stableness of the OrderBy
, then any range of values should give the same result.
The program then tests some values, in the range 1...4096. Looking at the result, it's quite clear that for low values (< 128), the algorithm is very biased (4-8%). With 3 values you need at least r.Next(1024)
. If you make the array bigger (4 or 5), then even r.Next(1024)
isn't enough. I'm not an expert in shuffling and in math, but I think that for each extra bit of length of the array, you need 2 extra bits of maximum value (because the birthday paradox is connected to the sqrt(numvalues)), so that if the maximum value is 2^31, I'll say that you should be able to sort arrays up to 2^12/2^13 bits (4096-8192 elements)
It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).
It works by assigning each element of the series a random integer, then ordering the sequence by these integers.
It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.
Startup time to run on code with clear all threads and cache every new test,
First unsuccessful code. It runs on LINQPad. If you follow to test this code.
Stopwatch st = new Stopwatch();
st.Start();
var r = new Random();
List<string[]> list = new List<string[]>();
list.Add(new String[] {"1","X"});
list.Add(new String[] {"2","A"});
list.Add(new String[] {"3","B"});
list.Add(new String[] {"4","C"});
list.Add(new String[] {"5","D"});
list.Add(new String[] {"6","E"});
//list.OrderBy (l => r.Next()).Dump();
list.OrderBy (l => Guid.NewGuid()).Dump();
st.Stop();
Console.WriteLine(st.Elapsed.TotalMilliseconds);
list.OrderBy(x => r.Next()) uses 38.6528 ms
list.OrderBy(x => Guid.NewGuid()) uses 36.7634 ms (It's recommended from MSDN.)
the after second time both of them use in the same time.
EDIT:
TEST CODE on Intel Core i7 4@2.1GHz, Ram 8 GB DDR3 @1600, HDD SATA 5200 rpm with [Data: www.dropbox.com/s/pbtmh5s9lw285kp/data]
using System; using System.Runtime; using System.Diagnostics; using System.IO; using System.Collections.Generic; using System.Collections; using System.Linq; using System.Threading; namespace Algorithm { class Program { public static void Main(string[] args) { try { int i = 0; int limit = 10; var result = GetTestRandomSort(limit); foreach (var element in result) { Console.WriteLine(); Console.WriteLine("time {0}: {1} ms", ++i, element); } } catch (Exception e) { Console.WriteLine(e.Message); } finally { Console.Write("Press any key to continue . . . "); Console.ReadKey(true); } } public static IEnumerable<double> GetTestRandomSort(int limit) { for (int i = 0; i < 5; i++) { string path = null, temp = null; Stopwatch st = null; StreamReader sr = null; int? count = null; List<string> list = null; Random r = null; GC.Collect(); GC.WaitForPendingFinalizers(); Thread.Sleep(5000); st = Stopwatch.StartNew(); #region Import Input Data path = Environment.CurrentDirectory + "\\data"; list = new List<string>(); sr = new StreamReader(path); count = 0; while (count < limit && (temp = sr.ReadLine()) != null) { // Console.WriteLine(temp); list.Add(temp); count++; } sr.Close(); #endregion // Console.WriteLine("--------------Random--------------"); // #region Sort by Random with OrderBy(random.Next()) // r = new Random(); // list = list.OrderBy(l => r.Next()).ToList(); // #endregion // #region Sort by Random with OrderBy(Guid) // list = list.OrderBy(l => Guid.NewGuid()).ToList(); // #endregion // #region Sort by Random with Parallel and OrderBy(random.Next()) // r = new Random(); // list = list.AsParallel().OrderBy(l => r.Next()).ToList(); // #endregion // #region Sort by Random with Parallel OrderBy(Guid) // list = list.AsParallel().OrderBy(l => Guid.NewGuid()).ToList(); // #endregion // #region Sort by Random with User-Defined Shuffle Method // r = new Random(); // list = list.Shuffle(r).ToList(); // #endregion // #region Sort by Random with Parallel User-Defined Shuffle Method // r = new Random(); // list = list.AsParallel().Shuffle(r).ToList(); // #endregion // Result // st.Stop(); yield return st.Elapsed.TotalMilliseconds; foreach (var element in list) { Console.WriteLine(element); } } } } }
Result Description: https://www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
Result Stat: https://www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG
Conclusion:
Assume: LINQ OrderBy(r.Next()) and OrderBy(Guid.NewGuid()) are not worse than User-Defined Shuffle Method in First Solution.
Answer: They are contradiction.