What's the best way to go about hashing an XML document in C#? I'd like to hash an XML document so that I can tell if it was manually changed from when it was generated. I'm not using this for security--it's OK if someone changes the XML, and changes the hash to match.
For example, I'd hash the child nodes of the root and store the hash as an attribute of the root:
<RootNode Hash="abc123">
<!-- Content to hash here -->
</RootNode>
.NET has classes that implement the XML digital signature spec. The signature can be added inside the original XML document (i.e. an "enveloped signature"), or stored/transferred separately.
It may be a bit overkill since you don't need the security, but it has the advantage of being already implemented, and being a standard which does not depend on a language or platform.
You can use the cryptography name space:
System.Security.Cryptography.MACTripleDES hash = new System.Security.Cryptography.MACTripleDES(Encoding.Default.GetBytes("mykey"));
string hashString = Convert.ToBase64String(hash.ComputeHash(Encoding.Default.GetBytes(myXMLString)));
You just need to use a key to create the hashing cryptographer and then create a hash with the string reqpresentation of your xml.
Add a .NET reference to System.Security, and use XmlDsigC14NTransform. Here's an example...
/* http://www.w3.org/TR/xml-c14n
Of course is cannot detect these are the same...
<color>black</color> vs. <color>rgb(0,0,0)</color>
...because that's dependent on app logic's interpretation of XML data.
But otherwise it gets the following right...
•Normalization of whitespace in start and end tags
•Lexicographic ordering of namespace and attribute
•Empty element conversion to start-end tag pair
•Retain all whitespace between tags
And more.
*/
public static string XmlHash(XmlDocument myDoc)
{
var t = new System.Security.Cryptography.Xml.XmlDsigC14NTransform();
t.LoadInput(myDoc);
var s = (Stream)t.GetOutput(typeof(Stream));
var sha1 = SHA1.Create();
var hash = sha1.ComputeHash(s);
var base64String = Convert.ToBase64String(hash);
s.Close();
return base64String;
}
I recently had to implement a hash "checksum" for partial XML documents at work (we use XElement). Rudimentary performance tests showed ~3x runtime speedup on my machine when using a lookup table for creating the hex string hash, compared to without.
Here is my implementation:
using System.Xml.Linq;
using System.Security.Cryptography;
using System.Text;
using System.Linq;
/// <summary>
/// Provides a way to easily compute SHA256 hash strings for XML objects.
/// </summary>
public static class XMLHashUtils
{
/// <summary>
/// Precompute a hexadecimal lookup table for runtime performance gain, at the cost of memory and startup performance loss.
/// SOURCE: https://stackoverflow.com/a/18574846
/// </summary>
static readonly string[] hexLookupTable = Enumerable.Range(0, 256).Select(integer => integer.ToString("x2")).ToArray();
static readonly SHA256Managed sha256 = new SHA256Managed();
/// <summary>
/// Computes a SHA256 hash string from an XElement and its children.
/// </summary>
public static string Hash(XElement xml)
{
string xmlString = xml.ToString(SaveOptions.DisableFormatting); // Outputs XML as single line
return Hash(xmlString);
}
/// <summary>
/// Computes a SHA256 hash string from a string.
/// </summary>
static string Hash(string stringValue)
{
byte[] hashBytes = sha256.ComputeHash(Encoding.UTF8.GetBytes(stringValue));
return BytesToHexString(hashBytes);
}
/// <summary>
/// Converts a byte array to a hexadecimal string using a lookup table.
/// </summary>
static string BytesToHexString(byte[] bytes)
{
int length = bytes.Length;
StringBuilder sb = new StringBuilder(length * 2); // Capacity fits hash string length
for (var i = 0; i < length; i++)
{
sb.Append(hexLookupTable[bytes[i]]); // Using lookup table for faster runtime conversion
}
return sb.ToString();
}
}
And heres a couple of unit tests for it (using the NUnit framework):
using NUnit.Framework;
using System.Linq;
using System.Xml.Linq;
public class XMLHashUtilsTest
{
/// <summary>
/// Outputs XML: <root><child attribute="value" /></root>
/// where <child /> node repeats according to childCount
/// </summary>
XElement CreateXML(int childCount)
{
return new XElement("root", Enumerable.Repeat(new XElement("child", new XAttribute("attribute", "value")), childCount));
}
[Test]
public void HashIsDeterministic([Values(0,1,10)] int childCount)
{
var xml = CreateXML(childCount);
Assert.AreEqual(XMLHashUtils.Hash(xml), XMLHashUtils.Hash(xml));
}
[Test]
public void HashChanges_WhenChildrenAreDifferent([Values(0,1,10)] int childCount)
{
var xml1 = CreateXML(childCount);
var xml2 = CreateXML(childCount + 1);
Assert.AreNotEqual(XMLHashUtils.Hash(xml1), XMLHashUtils.Hash(xml2));
}
[Test]
public void HashChanges_WhenRootNameIsDifferent([Values("A","B","C")]string nameSuffix)
{
var xml1 = CreateXML(1);
var xml2 = CreateXML(1);
xml2.Name = xml2.Name + nameSuffix;
Assert.AreNotEqual(XMLHashUtils.Hash(xml1), XMLHashUtils.Hash(xml2));
}
[Test]
public void HashChanges_WhenRootAttributesAreDifferent([Values("A","B","C")]string attributeName)
{
var xml1 = CreateXML(1);
var xml2 = CreateXML(1);
xml2.Add(new XAttribute(attributeName, "value"));
Assert.AreNotEqual(XMLHashUtils.Hash(xml1), XMLHashUtils.Hash(xml2));
}
}
来源:https://stackoverflow.com/questions/1521249/generating-an-xml-document-hash-in-c-sharp