问题
I'm working on an application that allows users to add their own RSS feeds to a simple reader of sorts.
Currently, I'm using xml_domit_rss
as the parser but I'm unsure if it's actually validating the URL before parsing.
From what I can gather online, it looks as if validating is separate from the parse, either by using a service https://www.feedvalidator.org or some other method such as parse_url()
.
Anyone have some insight into either how xml_domit_rss
validates, or a method by which I can validate before sending the URL to the parser?
回答1:
It's simple, You can do that using SyndicationFeed. It supports Atom 1.0 and RSS 2.0 versions.
try
{
SyndicationFeed fetchedItems = SyndicationFeed.Load(XmlReader.Create(feedUrl));
// Validation successful.
}
catch { // Validation failed. };
回答2:
You could validate the RSS with a RelaxNG schema. Schemas for all the different feed formats should be available online...
回答3:
Validating in the context of XML files (and hence RSS/Atom feeds which use XML to encode the values) means to use a document schema which describes the expected structure of the XML file (which elements can have what child elements, what attributes can be present, etc).
Now some XML parsers require a schema and bork (this is a technical term :-) - refuse to parse) on XML files not conforming to the schema. Now seeing how you are parsing arbitrary RSS, probably it is the best to skip validating and make the best effort of parsing the RSS feed. Also, you could show the parse results to the user (similar to how Google Reader does it when you add a new feed) and let her judge if the result looks ok.
Unfortunately the XML parser used by this code seems to be unfortunately dead and I can't find any detail how strict or lax it is in its parsing...
回答4:
This is my quick and dirty solution that worked for me under similar circumstances
foreach($sources as $source) {
if(!$source["url"]) {
continue;
}
$rss = curl_request($source["url"]);
$rss = str_replace('&', '&', $rss);
$parser = xml_parser_create();
if(xml_parse($parser, $rss)) {
$xmle = new SimpleXMLElement($rss);
}
else {
$xmle = null;
continue;
}
//other stuff here
}
I make sure to replace the ampersands with &
, because not doing that can cause issues with the SimpleXMLElement parser and entities such as •
or —
The xml_parse
returns 1 on success, so you can check it with a straight if
statement. Then using the SimpleXMLElement to traverse the RSS feed makes things nice and easy.
回答5:
try this code
function validateFeed( $sFeedURL )
{
$sValidator = 'http://feedvalidator.org/check.cgi?url=';
if( $sValidationResponse = @file_get_contents($sValidator . urlencode($sFeedURL)) )
{
if( stristr( $sValidationResponse , 'This is a valid RSS feed' ) !== false )
{
return true;
}
else
{
return false;
}
}
else
{
return false;
}
}
?>
来源:https://stackoverflow.com/questions/451338/validating-an-rss-feed