I need to find a way to find a way to find the hash for the base64 encoded data in the XML node //note/resource/data, or somehow otherwise match it to the hash value in the node //note/content/en-note//en-media@hash
See below for the full XML file
Please suggest a way to {obtain|match} using XSLT
4aaafc3e14314027bb1d89cf7d59a06c
{from|with}
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
This sample XML file has obviously been trimmed for brevity/simplicity. The actual may contain > 1 image per note, therefore the need to obtain/match hashes.
The XML file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export.dtd">
<en-export export-date="20091029T063411Z" application="Evernote/Windows" version="3.0">
<note>
<title>A title here</title>
<content><![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml.dtd">
<en-note bgcolor="#FFFFFF">
<p>Some text here (followed by the picture)
<p><en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="A picture"/></p>
<p>Some more text here (preceded by the picture)
</en-note>
]]></content>
<created>20090925T063154Z</created>
<note-attributes>
<author/>
</note-attributes>
<resource>
<data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
</data>
<mime>image/gif</mime>
<resource-attributes>
<file-name>clip_image001.gif</file-name>
</resource-attributes>
</resource>
</note>
</en-export>
Implemented solution
Using concept of the solution suggested by Jackem. The main difference is that I avoid creating my own Java class (and creating an extra dependency). I do the processing within the XSLT, since it's straight forward enough, only referencing external dependencies that come with the basic Java libraries.
Jackem's solution is more correct because it doesn't lose the leading zero in some hashes, however I found that it was much easier to take care of this elsewhere using li'l basic hackery.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
...
xmlns:md5="java.security.MessageDigest"
xmlns:bigint="java.math.BigInteger"
exclude-result-prefixes="md5 bigint">
...
<xsl:for-each select="resource">
<xsl:variable name="md5inst" select="md5:getInstance('MD5')" />
<xsl:value-of select="md5:update($md5inst, $b64bin)" />
<xsl:variable name="imgmd5bytes" select="md5:digest($md5inst)" />
<xsl:variable name="imgmd5bigint" select="bigint:new(1, $imgmd5bytes)" />
<xsl:variable name="imgmd5str" select="bigint:toString($imgmd5bigint, 16)" />
<!-- NOTE: $imgmd5str loses the leading zero from imgmd5bytes (if there is one) -->
</xsl:for-each>
...
P.S. see sibling question for my implementation of of the base64-->image file
conversion
This question is a subquestion of another question I have asked previously.
For your related question about doing the base64 decoding in XSLT, you have accepted an answer which uses Saxon and Java extensions. So I assume you are OK with using those.
In that case, you can create an extension in Java for computing the MD5 sum:
package com.stackoverflow.q1684963;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class MD5Sum {
public static String calc(byte[] data) throws NoSuchAlgorithmException {
MessageDigest md5 = MessageDigest.getInstance("MD5");
byte[] digest = md5.digest(data);
BigInteger digestValue = new BigInteger(1, digest);
return String.format("%032x", digestValue);
}
}
From your XSLT 2.0 stylesheet which you run with Saxon, you can then just call that extension. Assuming you already have the base64-decoded data (for example from extension function saxon:base64Binary-to-octets
as in the linked answer) in variable data
:
<xsl:value-of xmlns:md5sum="com.stackoverflow.q1684963.MD5Sum"
select="md5sum:calc($data)"/>
- Download some freeware Base64 decoder like this one or use some source code from the web for this
- Output file is some_file.gif, 268 bytes, a folder icon
- Generate the MD5 checksum of that file using md5sum or again some source code from the web
Output for me:
4aaafc3e14314027bb1d89cf7d59a06c
That's what you wanted, isn't it? It will be tricky (if not impossible, and if you ask me, definitely not worth the effort) to do all this in XSLT, but at least you now have got the information that this hash was created using MD5 on the GIF file.
The 4aaaf...
is the MD5 of the binary data you get when you decode the base64-encoded data. I don't think you have any choice but to decode the contents of <data>
element and run it through an MD5 implementation, which is obviously outside the scope of an XSL transformation. Presumably, the result of the XSLT will be processed by some other code, which can extract and verify the images.
How about this (add commons-codec to your classpath):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:digest="org.apache.commons.codec.digest.DigestUtils">
[...]
<xsl:value-of select="digest:md5Hex('hello, world!')"/>
</xsl:stylesheet>
来源:https://stackoverflow.com/questions/1684963/xslt-obtaining-or-matching-hashes-for-base64-encoded-data