I'm developing a chromium extension so I have cross-host permissions for XMLHttpRequests for the domains I'm asking permissions for.
I have used XMLHttpRequest and got an HTML webpage (txt/html). I want to use XPath (document.evaluate) to extract relevant bits from it. Unfortunatly I'm failing to construct a DOM object from the returned string of the html.
var xhr = new XMLHttpRequest();
var name = escape("Sticks N Stones Cap");
xhr.open("GET", "http://items.jellyneo.net/?go=show_items&name="+name+"&name_type=exact", true);
xhr.onreadystatechange = function () {
if (xhr.readyState == 4) {
var parser = new DOMParser();
var xmlDoc = parser.parseFromString(xhr.responseText,"text/xml");
console.log(xmlDoc);
}
}
xhr.send();
console.log
is to display debug stuff in Chromium JS console.
In the said JS console. I get this:
Document
<html>
<body>
<parsererror style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black">
<h3>This page contains the following errors:</h3>
<div style="font-family:monospace;font-size:12px">error on line 1 at column 60: Space required after the Public Identifier
</div>
<h3>Below is a rendering of the page up to the first error.</h3>
</parsererror>
</body>
</html>
So how am I suppose to use XMLHttpRequest -> receive HTML -> convert to DOM -> use XPath to transverse?
Should I be using the "hidden" iframe hack for loading / receiving DOM object?
The DOMParser is choking on the DOCTYPE definition. It would also error on any other non-xhtml markup such as a <link>
without a closing /
. Do you have control over the document being sent? If not, your best bet is to parse it as a string. Use regular expressions to find what you are looking for.
Edit: You can get the browser to parse the contents of the body for you by injecting it into a hidden div:
var hidden = document.body.appendChild(document.createElement("div"));
hidden.style.display = "none";
hidden.innerHTML = /<body[^>]*>([\s\S]+)<\/body>/i(xhr.responseText)[1];
Now search inside hidden
to find what you're looking for:
var myEl = hidden.querySelector("table.foo > tr > td.bar > span.fu");
var myVal = myEl.innerHTML;
来源:https://stackoverflow.com/questions/3972880/how-to-create-dom-object-from-html-page-received-over-xmlhttprequest