<!ENTITY ha "Ha !">
defines an entity, &ha;
that expands to "Ha !"
. The next line defines another entity, &ha2;
that expands to "&ha; &ha;"
and eventually, "Ha ! Ha !"
.
&ha3;
turns into Ha ! Ha ! Ha ! Ha !
, and so on, doubling the number each time. If you follow the pattern, &haN;
is "Ha !"
, 2N-1 times, so &ha128
, expands to 2127 "Ha !"
s, which is too big for any computer to handle.
The Billion Laughs attack is a denial-of-service attack that targets XML parsers. The Billion Laughs attack is also known as an XML bomb, or more esoterically, the exponential entity expansion attack. A Billion Laughs attack can occur even when using well-formed XML and can also pass XML schema validation.
The vanilla Billion Laughs attack is illustrated in the XML file represented below.
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
In this example, there are 10 different XML entities, lol
– lol9
. The first entity, lol
is defined to be the string “lol”
. However, each of the other entities are defined to be 10 of another entity. The document content section of this XML file contains a reference to only one instance of the entity lol9
. However, when this is being parsed by a DOM or SAX parser, when lol9
is encountered, it is expanded into 10 lol8
s, each of which is expanded into 10 lol7
s, and so on and so forth. By the time everything is expanded to the text lol
, there are 100,000,000 instances of the string "lol"
. If there was one more entity, or lol
was defined as 10 strings of “lol”
, there would be a Billion “lol”s, hence the name of the attack. Needless to say, this many expansions consumes an exponential amount of resources and time, causing the DOS.
A more extensive explanation exists on my blog.
One of the XML bombs - http://msdn.microsoft.com/en-us/magazine/ee335713.aspx
An attacker can now take advantage of these three properties of XML (substitution entities, nested entities, and inline DTDs) to craft a malicious XML bomb. The attacker writes an XML document with nested entities just like the previous example, but instead of nesting just one level deep, he nests his entities many levels deep...
There is also code to protect from these "bombs" (in .NET world):
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
settings.MaxCharactersFromEntities = 1024;
XmlReader reader = XmlReader.Create(stream, settings);