In this answer, the below code was posted for creating unique random alphanumeric strings. Could someone clarify for me how exactly they are ensured to be unique in this cod
I need to generate 7 characters of an alphanumeric string. With a small search, I wrote the below code. Performance results are uploaded above
I have used hashtable Class to guarantee uniqueness and also used RNGCryptoServiceProvider Class to get high-quality random chars
results of generating 100.000 - 1.000.000 - 10.000.000 sample
Generating unique strings; thanks to nipul parikh
public static Tuple<List<string>, List<string>> GenerateUniqueList(int count)
{
uniqueHashTable = new Hashtable();
nonUniqueList = new List<string>();
uniqueList = new List<string>();
for (int i = 0; i < count; i++)
{
isUniqueGenerated = false;
while (!isUniqueGenerated)
{
uniqueStr = GetUniqueKey();
try
{
uniqueHashTable.Add(uniqueStr, "");
isUniqueGenerated = true;
}
catch (Exception ex)
{
nonUniqueList.Add(uniqueStr);
// Non-unique generated
}
}
}
uniqueList = uniqueHashTable.Keys.Cast<string>().ToList();
return new Tuple<List<string>, List<string>>(uniqueList, nonUniqueList);
}
public static string GetUniqueKey()
{
int size = 7;
char[] chars = new char[62];
string a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
chars = a.ToCharArray();
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
byte[] data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
result.Append(chars[b % (chars.Length - 1)]);
return Convert.ToString(result);
}
Whole Console Application Code below;
class Program
{
static string uniqueStr;
static Stopwatch stopwatch;
static bool isUniqueGenerated;
static Hashtable uniqueHashTable;
static List<string> uniqueList;
static List<string> nonUniqueList;
static Tuple<List<string>, List<string>> generatedTuple;
static void Main(string[] args)
{
int i = 0, y = 0, count = 100000;
while (i < 10 && y < 4)
{
stopwatch = new Stopwatch();
stopwatch.Start();
generatedTuple = GenerateUniqueList(count);
stopwatch.Stop();
Console.WriteLine("Time elapsed: {0} --- {1} Unique --- {2} nonUnique",
stopwatch.Elapsed,
generatedTuple.Item1.Count().ToFormattedInt(),
generatedTuple.Item2.Count().ToFormattedInt());
i++;
if (i == 9)
{
Console.WriteLine(string.Empty);
y++;
count *= 10;
i = 0;
}
}
Console.ReadLine();
}
public static Tuple<List<string>, List<string>> GenerateUniqueList(int count)
{
uniqueHashTable = new Hashtable();
nonUniqueList = new List<string>();
uniqueList = new List<string>();
for (int i = 0; i < count; i++)
{
isUniqueGenerated = false;
while (!isUniqueGenerated)
{
uniqueStr = GetUniqueKey();
try
{
uniqueHashTable.Add(uniqueStr, "");
isUniqueGenerated = true;
}
catch (Exception ex)
{
nonUniqueList.Add(uniqueStr);
// Non-unique generated
}
}
}
uniqueList = uniqueHashTable.Keys.Cast<string>().ToList();
return new Tuple<List<string>, List<string>>(uniqueList, nonUniqueList);
}
public static string GetUniqueKey()
{
int size = 7;
char[] chars = new char[62];
string a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
chars = a.ToCharArray();
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
byte[] data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
result.Append(chars[b % (chars.Length - 1)]);
return Convert.ToString(result);
}
}
public static class IntExtensions
{
public static string ToFormattedInt(this int value)
{
return string.Format(CultureInfo.InvariantCulture, "{0:0,0}", value);
}
}
Using strictly alphanumeric characters restricts the pool you draw from to 62. Using the complete printable character set(ASCII 32-126) increases your pool to 94, decreasing the likelihood of collision and eliminating creating the pool separately.
Uniqueness and randomness are mutually exclusive concepts. If a random number generator is truly random, then it can return the same value. If values are truly unique, although they may not be deterministic, they certainly aren't truly random, because every value generated removes a value from the pool of allowed values. This means that every run affects the outcome of subsequent runs, and at a certain point the pool is exhausted (barring of course the possibility of an infinitely-sized pool of allowed values, but the only way to avoid collisions in such a pool would be the use of a deterministic method of choosing values).
The code you're showing generates values that are very random, but not 100% guaranteed to be unique. After enough runs, there will be a collision.
There is nothing in the code that guarantees that the result is unique. To get a unique value you either have to keep all previous values so that you can check for duplicates, or use a lot longer codes so that duplicates are practically impossible (e.g. a GUID). The code contains less than 48 bits of information, which is a lot less than the 128 bits of a GUID.
The string is just random, and although a crypto strength random generator is used, that is ruined by how the code is generated from the random data. There are some issues in the code:
GetNonZeroBytes
method is used instead of the GetBytes
method, which adds a skew to the distribution of characters as the code does nothing to handle the lack of zero values.%
) operator is used to reduce the random number down to the number of characters used, but the random number can't be evenly divided into the number of characters, which also adds a skew to the distribution of characters.chars.Length - 1
is used instead of chars.Length
when the number is reduced, which means that only 61 of the predefined 62 characters can occur in the string.Although those issues are minor, they are important when you are dealing with crypo strength randomness.
A version of the code that would produce a string without those issues, and give a code with enough information to be considered practically unique:
public static string GetUniqueKey() {
int size = 16;
byte[] data = new byte[size];
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
crypto.GetBytes(data);
return BitConverter.ToString(data).Replace("-", String.Empty);
}