SQL Server\'s hashing function HASHBYTES
has an input limit of 8000 bytes.
How do you hash larger strings?
You could write a SQL CLR function:
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBinary BigHashBytes(SqlString algorithm, SqlString data)
{
var algo = HashAlgorithm.Create(algorithm.Value);
var bytes = Encoding.UTF8.GetBytes(data.Value);
return new SqlBinary(algo.ComputeHash(bytes));
}
And then it can be called in SQL like this:
--these return the same value
select HASHBYTES('md5', 'test stuff')
select dbo.BigHashBytes('md5', 'test stuff')
The BigHashBytes
is only necessary if the length would be over 8k.
Like Paul's idea, one idea that comes to mind for chunking would be to store the hashed string in an XML column, with each chunk as a separate XML element.
You could hash 8k (or 4k or 2k) chunks of the input and then either concatenate those hashes or hash them into a new hash value. This might get difficult though if you have to create a similar algorithm (in an external .NET app for example) to compare hashes created outside of SQL Server.
Another option: Lean on SQL Server's CLR integration and perform the hashing in a .NET assembly.