I am trying to parse this html through jQuery to get data1, data2, data3. While I do get data2 and data3 I am unable to get data3 with my approach. I am fairly new to jQuery
I think I have an even better way:
let's say you've got your html:
var htmlText = '<html><body><div class="class0"><h4>data1</h4><p class="class1">data2</p><div id="mydivid"><p>data3</p></div></div></body></html>'
Here's the thing you've been hoping to do:
var dataHtml = $($.parseXML(htmlText)).children('html');
dataHtml
now works exactly like the ordinary jquery objects you're familiar with!!
The wonderful thing about this solution is that it will not strip body, head, or script tags!
It doesn't work because the <div>
with the class class0
doesn't have any text nodes as direct children. Add the class to the <h4>
and it will work
Try this
alert($(datahtml).find(".class0 h4").text());
The reason being the text you are referring to is inside h4
element of class0
.. So your selector will not work,,
Or access the contents directly..
alert($(".class0 h4").text());
alert($(".class1").text());
alert($("#mydivid").text());
EDIT
var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";
$('body').html(datahtml);
alert($(".class0 h4").text());
alert($(".class1").text());
alert($("#mydivid").text());
CHECK DEMO
None of the current answers addressed the real issue, so I'll give it a go.
var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";
console.log($(datahtml));
$(datahtml)
is a jQuery object containing only the div.class0
element, thus when you call .find
on it, you're actually looking for descendants of div.class0
instead of the whole HTML document that you'd expect.
A quick solution is to wrap the parsed data in an element so the .find
will work as intended:
var parsed = $('<div/>').append(datahtml);
console.log(parsed.find(".class0").text());
Fiddle
The reason for this isn't very simple, but I assume that as jQuery does "parsing" of more complex html strings by simply dropping your HTML string into a separate created-on-the-fly DOM fragment and then retrieves the parsed elements, this operation would most likely make the DOM parser ignore the html
and body
tags as they would be illegal in this case.
Here is a very small test suite which demonstrates that this behavior is consistent through jQuery 1.8.2 all the way down to 1.6.4.
Edit: quoting this post:
Problem is that jQuery creates a DIV and sets
innerHTML
and then takes DIV children, but since BODY and HEAD elements are not valid DIV childs, then those are not created by browser.
Makes me more confident that my theory is correct. I'll share it here, hopefully it makes some sense for you. Have the jQuery 1.8.2's uncompressed source side by side with this. The #
indicates line numbers.
All document fragments made through jQuery.buildFragment
(defined @#6122) will go through jQuery.clean
(#6151) (even if it is a cached fragment, it already went through the jQuery.clean
when it was created), and as the quoted text above implies, jQuery.clean
(defined @#6275) creates a fresh div
inside the safe fragment to serve as container for the parsed data - div
element created at #6301-6303, childNodes
retrieved at #6344, div removed at #6347 for cleaning up (plus #6359-6361 as bug fix), childNodes
merged into the return array at #6351-6355 and returned at #6406.
Therefore, all methods that invoke jQuery.buildFragment
, which include jQuery.parseHTML
and jQuery.fn.domManip
- among those are .append()
, .after()
, .before()
which invoke the domManip
jQuery object method, and the $(html)
which is handled at jQuery.fn.init
(defined @#97, handling of complex [more than a single tag] html strings @#125, invokes jQuery.parseHTML
@#131).
It makes sense that virtually all jQuery HTML strings parsing (besides single tag html strings) is done using a div
element as container, and html
/body
tags are not valid descendants of a div
element so they are stripped out.
Addendum: Newer versions of jQuery (1.9+) have refactored the HTML parsing logic (for instance, the internal jQuery.clean
method no longer exists), but the overall parsing logic remains the same.
I don't know any other way than placing the HTML in an temporary invisible container.
$(document).ready(function(){
var datahtml = $("<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>".replace("\\", ""));
var tempContainer = $('<div style="display:none;">'+ datahtml +'</div>');
$('body').append(tempContainer);
alert($(tempContainer).find('.class1').text());
$(tempContainer).remove();
});
Here is a jsfiddle demo.
I think the main problem is that you cannot have an html to your jquery. In your case what happens to Jquery is that it tries to find the first html tag, That in your case is the div with class0.
Test this to see that I am right:
if($(datahtml).hasClass('class0'))
alert('Yes you are right :-)');
So this means that you cannot add the html and or the body tag as a part to have a query within.
If you want to make it work just try to add this part of code:
<div>
<div class="class0">
<h4>data1</h4>
<p class="class1">data2</p>
<div id="mydivid"><p>data3</p></div>
</div>
</div>
So try this:
var datahtml = "<div><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></div>";
alert($(datahtml).find(".class0").text()); // work
alert($(datahtml).find(".class1").text()); // work
alert($(datahtml).find("#mydivid").text()); // work