问题
How do I grab pieces of content from external websites and display them on my website? (Similar to what an RSS feed or other aggregator does).
For example, say I want to display items from another website's calendar:
Other website:
<h1>Here's our calendar:</h1>
<div class="calendar_item">
<h2>Boston Marathon</h2>
<p class="date">June 23, 2012</p>
<p class="description">This marathon is 26.2 miles and lots of fun.</p>
</div>
<div class="calendar_item">
<h2>Irish Pub Crawl</h2>
<p class="date">July 17, 2012</p>
<p class="description">Shamrocks and green things are super-fun.</p>
</div>
<div class="calendar_item">
<h2>Tim's Birthday</h2>
<p class="date">August 25, 2012</p>
<p class="description">It's Tim's birthday, yo.</p>
</div>
My website:
<h1>Here's a feed of some calendar items from someone else's website:</h1>
<div class="event_title">Boston Marathon</div>
<div class="event_date">June 23, 2012</div>
<div class="event_description">This marathon is 26.2 miles and lots of fun.</div>
<div class="event_title">Irish Pub Crawl</div>
<div class="event_date">July 17, 2012</div>
<div class="event_description">Shamrocks and green things are super-fun.</div>
<div class="event_title">Tim's Birthday</div>
<div class="event_date">August 25, 2012</div>
<div class="event_description">It's Tim's birthday, yo.</div>
Here's what I've tried (using MAMP):
<?php
$url = "http://example.com";
$page = curl($url);
$pattern = '%
<h2>(.+?)</h2>
%i';
preg_match($pattern,$page,$matches);
print_r($matches);
?>
...which prints:
Array ( )
The tutorials/etc. I've viewed include ambiguous answers like "try cURL". This seems so simple, but I'm a stumped noob.
Thanks in advance, guys :)
回答1:
I would not recommend regex for parsing HTML. PHP 5+ comes with a parser which you can use as shown below.
$content = file_get_contents('test.html');
$doc =
<<<DOC
$content
DOC;
$dom = new DOMDocument();
$dom->loadHTML($doc);
$h2Tags = $dom->getElementsByTagName("h2");
$pTags = $dom->getElementsByTagName("p");
foreach($h2Tags as $h2 ) {
//do something
}
foreach($pTags as $p ) {
if($p->getAttribute("class") == "date") {
//do something
}
}
$h2 is of type DOMElement. It inherits DOMNode. So you can use nodeValue property to access the values. In the above example, you can write $h2->nodeValue to access the content.
回答2:
you can try this library http://simplehtmldom.sourceforge.net/
then just:
foreach($dom->find('p[class=date]' as $p) {
$date = $p->innertext;
}
this would give you the contents of
or you do it more globaly and dig through with stripos
foreach($dom->find('p') as $p) {
if(stripos($p->class, 'date') !== false) {
//do something
}
}
回答3:
Here's an example for using cURL:
http://tr2.php.net/manual/en/curl.examples-basic.php
and check if you are getting data before applying preg_match
. If you get some, then it's the regex which causes your problem.
来源:https://stackoverflow.com/questions/10486704/how-do-i-display-content-grabbed-from-external-websites