问题
So I've been working on a way to scrape the data from a page and display it (in roughly the same format as the source). I found YQL and I am finding it brilliant, except I can't figure out how to just display the whole output with nothing special (except the basic formatting)
The YQL input code is:
select * from html where url="http://directory.vancouver.wsu.edu/anthropology" and xpath="//div[@id='facdir']"
using that it returns the JSON:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fdirectory.vancouver.wsu.edu%2Fanthropology%22%20and%20xpath%3D%22%2F%2Fdiv%5B%40id%3D'facdir'%5D%22&format=json&callback=anthropology
I've followed the yahoo tutorials, and created the news widget among other things, but not one tutorial covered the basic view (don't need the links either, just the paragraph setup).
Like this:
Name
Title
Phone:(###)###-####
Location: Building and Room #
email@vancouver.wsu.edu
Here is what I had for output from http://christianheilmann.com, but it doesn't do anything (apparently none of her tutorials work, tried every one):
<html>
<head>
<script src="http://code.jquery.com/jquery-latest.js"></script>
</head>
<body>
<p>
<b>Copied:</b>
</p>
<div>
<script>
function anthropology (0) {
// get the DIV with the ID $
var info = document.getElementById('facdir');
// add a class for styling
info.className = 'js';
// if it exists
if(info){
// get the info data returned from YQL
var data = o.query.results.span;
var link = info.getElementsByTagName('a')[0];
link.innerHTML = '(see all info)';
// to the main container DIV
var out = document.createElement('span');
out.className = 'info';
info.insertBefore(out,link.parentNode);
}
}
</script>
<script src='http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fdirectory.vancouver.wsu.edu%2Fanthropology%22%20and%20xpath%3D%22%2F%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%22&format=json&callback=anthropology'></script>
</div>
回答1:
I've recently completed a tutorial with a couple of jsFiddles and explain how to use YQL
, XPATH
, and jQuery .ajax()
for a different SO Question, which will shed some light in your direction. You can see that SO Answer here.
To comply with a acceptable answer for your question, I've put together a working demo to show you how easy it is to data scrape the data from the webpage your requesting.
The jsFiddle Demo contains lots of comments and console.log()
messages to understand the workflow process. Ensure you active your browsers console and use Firebug for example. The HTML
and CSS
used to construct the Faculty Member Boxes mimic those from the original website, including Links in the Image, Name, Email, and Webpage Theme too.
DEMO:
jsFiddle Data Scraping XML: Dynamic Webpage Building
Revised!!! In addition to revised jsFiddle above, see related
jsFiddle Tutorial: Creating Dynamic Div's (Now Improved!)
HTML:
<div id="results"></div>
jQuery:
var directoryName = 'child-development-program';
$.ajax({
type: 'GET',
url: "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fdirectory.vancouver.wsu.edu%2F" + directoryName + "%22%20and%20xpath%3D%22%2F%2Fdiv%5B%40id%3D'content-inner'%5D%2Fdiv%2Fdiv%2Fdiv%2Fdiv%2Fdiv%5B2%5D%22",
dataType: 'xml',
success: function(data) {
if (data) {
// Show in console the jQuery Object.
console.info('Here is the returned query');
console.log( $(data).find('query') );
// Show in console the results in inner-html text.
var textResults = $(data).find('results').text();
console.log( textResults );
// Parse the list of faculty members. Variable indexFM is not used for indexed faculty member.
$(data).find('results').find('.views-row').each(function(indexFM){
// This variable will store the current faculty member.
var facultyMember = this;
console.info('Faculty jQuery DIV Object shown on next lines.');
console.log( facultyMember );
// Parse the contents of each faculty member. Variable indexFC is not used for indexed faculty content.
$(facultyMember).each(function(indexFC){
// Get Thumbnail Image of Faculty Member
var facultyMemberImage = $(this).find('.views-field-field-profile-image-fid #directoryimage a img').attr('src');
console.log( facultyMemberImage );
// Get Title (Name) of Faculty Member
var facultyMemberTitle = $(this).find('.views-field-field-professional-title-value #largetitle').text();
console.log( facultyMemberTitle );
// Get relative URL fragment.
//
// Stackoverflow Edit: Much more extraction in this section, see jsFiddle link above.
//
// Get Email of Faculty Member
var facultyMemberEmail = $(this).find('.views-field-field-email-value span').text();
// Simple dashed line to separate faculty members as seen in browser console.
console.log('--------');
var divObject = '<div class="dynamicResults"><div class="dynamicThumb"><a href="' + facultyMemberUrl + '"><img src="' + facultyMemberImage + '" alt=""></a></div><div class="dynamicInfo"><div class="dynamicText"><a href="' + facultyMemberUrl + '" class="dynamicName">' + facultyMemberTitle + '</a></div><div class="dynamicText">' + facultyMemberPosition + '</div><div class="dynamicText">Phone: ' + facultyMemberPhone + '</div><div class="dynamicText">Location: ' + facultyMemberBuilding + ' <span>' + facultyMemberRoom + '</span></div><div class="dynamicText"><a href="' + facultyMemberEmailUrl + '" class="dynamicEmail">' + facultyMemberEmail + '</a><span class="dynamicEmailpic"></span></div></div></div><div class="clear"></div>';
// Build webpage with dynamic data.
$('#results').append( divObject );
});
});
}
}
});
Screenshot: Thumbnails in photo are 100px x 100px Revised Photo for Revised jsFiddle!!
But in really looking at your Question, I wanted to try something new and simple... the results are very acceptable however. This time, the data scraping technique is using the webpages native CSS
file as an asset in the jsFiddle, while also using the returned data directly into the DOM
.
This method uses the same principle as above, except it's using html
as the .ajax()
dataType
to have available a near clone of the original webpage. The only drawback is the requirement for the whole CSS file, but you can parse an original file to remove excess styles and selectors not needed (Important as not to break the 4096 CSS Selector barrier in IE).
DEMO:
jsFiddle Data Scraping HTML: Clone That Webpage
HTML
<link type="text/css" rel="stylesheet" media="all" href="http://directory.vancouver.wsu.edu/sites/directory.vancouver.wsu.edu/files/css/css_f9f00e4e3fa0bf34a1cb2b226a5d8344.css" />
<div id="facultyAnthropology"></div>
jQuery:
var directoryName = 'anthropology';
$.ajax({
type: 'GET',
url: "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fdirectory.vancouver.wsu.edu%2F"+directoryName+"%22%20and%20xpath%3D%22%2F%2Fdiv%5B%40id%3D'content-area'%5D%22",
dataType: 'html',
success: function(data) {
$('#facultyAnthropology').append($(data).find('results'));
}
});
Screenshot: As above, Thumbnails in photo are 100px x 100px
来源:https://stackoverflow.com/questions/14048943/how-to-simply-display-the-xml-output-from-yql-or-have-the-json-output-to-html