This is a very hard but interesting probability question related to Knuth shuffle.
When looping for each element, the swap is performed for the current element with
This page compares the algorithm you mention with Knuth's, showing why Knuth's is better.
Note: the below might be wrong, see the comments.
I'm not aware of an easy way to compute the probability you ask about and I can't seem to find any simple explanation either, but the idea is that your algorithm (usually referred to as the naive shuffle algorithm) considers n^n
permutations of the array instead of n!
. This is because element i
can end up at each of n
positions at step i
. Since you have n
possibilities at each step and n
steps, that adds up to n^n
. Since n^n
is not always divisible by n!
it follows that not all permutations have the same probability, which is why the algorithm is considered a bad shuffle.
Since not all permutations have the same probability, it follows that the probability you ask about is different for different values of i
and j
, but I'm not aware of a formula that computes it.
In a proper Knuth shuffle, you loop over i
elements choosing one of 52 to swap with element 1, then one of 51 remaining to swap with element 2, one of of 50 remaining to swap with element 3, etc. Once an element has been placed into the growing set 1,2,3... it will never move again. So the odds of element i
ending up in position 1 is exactly 1/52--it can only happen on the first loop. The odds of it ending up in cell 2 are the odds of it not being in cell 1 (51/52) times the odds of it being chosen on the second pass (1/51). The 51s cancel, leaving 1/52. Likewise, the chance of element i
ending up in position 3 is 51/52 * 50/51 * 1/50, again 1/52. So every element has an equal chance to end up in each cell.
In your shuffle, already-placed elements may be chosen again for later switching, which makes calculating the odds very difficult.
The Knuth shuffle is as follows (in Python, but it could be pseudocode)
for i in range(len(v)):
swap(v, i, randrange(i, len(v))
The naïve shuffle is very similar, but it is not the Knuth shuffle:
for i in range(len(v)):
swap(v, i, randrange(0, len(v))
The Knuth shuffle produces uniformly-distributed permutations. The naïve shuffle can be proven not to, because there are nn possible sequences of random numbers, each of which has an equal probability, and n! possible permutations, which is not a factor of nn. (The Knuth shuffle, on the other hand, involves precisely n! possible random number sequences, each of which can be proven to produce a unique permutation.)
The above proof does not indicate whether or not the individual permutation positions in the naïve shuffle are uniformly distributed. It is possible for a non-uniform distribution of permutations to nonetheless produce a uniform distribution of permutation elements: for example, if all permutations other than rotations have probability 0, and rotations have equal probability, then the permutation elements will be uniformly distributed. (That would be an even worse shuffle than the naïve shuffle.)
As it turns out, the naïve shuffle does not produce uniformly distributed element positions. There are two points of regularity: the final position of the first element in the vector is uniformly distributed, as is are the elements which end up in the final position.
The most likely final position of element i (i≠0) is the position i−1.
Here's a table of transition probabilities for n=8, computed as a product of transition matrices:
from/to 0 1 2 3 4 5 6 7
0 .125 .125 .125 .125 .125 .125 .125 .125
1 .158 .116 .117 .118 .119 .121 .123 .125
2 .144 .151 .110 .112 .115 .118 .121 .125
3 .132 .139 .147 .107 .111 .115 .119 .125
4 .122 .129 .137 .146 .107 .112 .118 .125
5 .113 .120 .128 .137 .147 .110 .117 .125
6 .105 .112 .120 .129 .139 .151 .116 .125
7 .098 .105 .113 .122 .132 .144 .158 .125
It's possible to derive a closed-form for Pn(i, j) -- the probability that element i in a vector of n elements will be shuffled to position j.
In the algorithm, the swap at iteration i involves the element vi and some other element vj. (It's possible that i=j.) Although swap is a symmetric operation, it's useful to distinguish the two elements; we'll call this an out swap of the element vi and an in swap of vj
Note that after an in swap of element k, k can no longer be out swapped, because all the following out swaps are at positions following the new position of k. So if k is ever out swapped, it must be out swapped at iteration k; in other words, only the first swap involving an element can be at out swap.
Now, at a any iteration of the shuffle, the final destinations of the element about to be out swapped are uniformly distributed. (The probability of the final destination being j is the sum over all i of the probability of the next position being i times the probability that the final destination of the next position i being j. Since the next positions are uniformly distributed, the multiplier can be factored out, and the remaining sum is 1 since j must come from some i.)
Also, for an element which is never out swapped, its final destination is the iteration at which its last in swap occurred. (It's not possible for an element to be neither out swapped nor in swapped. If no in swap occurs before the element is in the out swap position, it will be out swapped.)
With all that, we can derive the formula for the transition function.
First, the probability that element k will be out swapped is precisely the probability that it is not in swapped in any of the iterations prior to k, which is (n-1)k/nk. Among the shuffles in which k is out swapped, the final destination is uniformly distributed, so this contributes (n−1)k/nk+1 to every transition probability Pn(k, j).
Now let's consider the cases in which the last in swap is at iteration j (and hence to position j). At every iteration, a given element will be in swapped with probability 1/n. Consequently, the probability that the last in swap is at iteration j is the probability that no swap occurs after j times the probability that the swap occurs at iteration j, which is (n−1)n−j−1/nn−j.
If j<k, then k cannot be out swapped, but if j≥k, we need to only count the cases where there was a swap prior to iteration k. This leads to the following definition:
Pn(k, j) = (n−1)k/nk+1 + (1−On(k, j))×(n−1)n-j-1/nn−j
where
On(k, j) = 0 if j<k, and otherwise (n−1)k/nk