Given two different messages, A and B (maybe 20-80 characters of text, if size matters at all), what is the probability that the MD5 digest of A is the same as the MD5 digest of
an addendum to Welbog's post:
Ratios of large factorials can be computed without using arbitrary-precision arithmetic, by using Stirling's approximation:
n! ≈ sqrt(2πn) * (n/e)n
So (S!)/(S^N * (S - N)!) ≈ sqrt(2πS)/sqrt(2π(S-N))*(S/e)S/((S-N)/e)S-N/SN
= sqrt(S/(S-N)) * (S/(S-N))S-N * e-N
= sqrt(1 + α) * (1 + α)S-N * e-N where α = N/(S-N) is small.
The approximation (1+a/n)nx ≈ eax holds as n → ∞ (or at least becomes very large)
** so this means (1+(N/(S-N)))S-N ≈ eN for S-N >> N.
So I would expect that
(S!)/(S^N * (S - N)!) ≈ sqrt(1 + N/(S-N)) * eN * e-N = sqrt(1 + N/(S-N)) for S-N >> N....
except this is greater than 1... so one of the approximations isn't good enough. :p
(** caveat: N/S has to be small: for N=22,S=365 this is off by a factor of 2)