Validating an RSS feed

时光总嘲笑我的痴心妄想 提交于 2020-01-15 12:15:33

问题


I'm working on an application that allows users to add their own RSS feeds to a simple reader of sorts.

Currently, I'm using xml_domit_rss as the parser but I'm unsure if it's actually validating the URL before parsing.

From what I can gather online, it looks as if validating is separate from the parse, either by using a service https://www.feedvalidator.org or some other method such as parse_url().

Anyone have some insight into either how xml_domit_rss validates, or a method by which I can validate before sending the URL to the parser?


回答1:


It's simple, You can do that using SyndicationFeed. It supports Atom 1.0 and RSS 2.0 versions.

try 
{
    SyndicationFeed fetchedItems = SyndicationFeed.Load(XmlReader.Create(feedUrl));
    // Validation successful.
} 
catch { // Validation failed. };



回答2:


You could validate the RSS with a RelaxNG schema. Schemas for all the different feed formats should be available online...




回答3:


Validating in the context of XML files (and hence RSS/Atom feeds which use XML to encode the values) means to use a document schema which describes the expected structure of the XML file (which elements can have what child elements, what attributes can be present, etc).

Now some XML parsers require a schema and bork (this is a technical term :-) - refuse to parse) on XML files not conforming to the schema. Now seeing how you are parsing arbitrary RSS, probably it is the best to skip validating and make the best effort of parsing the RSS feed. Also, you could show the parse results to the user (similar to how Google Reader does it when you add a new feed) and let her judge if the result looks ok.

Unfortunately the XML parser used by this code seems to be unfortunately dead and I can't find any detail how strict or lax it is in its parsing...




回答4:


This is my quick and dirty solution that worked for me under similar circumstances

foreach($sources as $source) {
    if(!$source["url"]) {
        continue;
    }

    $rss = curl_request($source["url"]);
    $rss = str_replace('&', '&', $rss);

    $parser = xml_parser_create();
    if(xml_parse($parser, $rss)) {
        $xmle = new SimpleXMLElement($rss);
    }
    else {
        $xmle = null;
        continue;
    }

    //other stuff here
}

I make sure to replace the ampersands with &, because not doing that can cause issues with the SimpleXMLElement parser and entities such as • or —

The xml_parse returns 1 on success, so you can check it with a straight if statement. Then using the SimpleXMLElement to traverse the RSS feed makes things nice and easy.




回答5:


try this code

function validateFeed( $sFeedURL )
{

    $sValidator = 'http://feedvalidator.org/check.cgi?url=';

    if( $sValidationResponse = @file_get_contents($sValidator . urlencode($sFeedURL)) )
    {
        if( stristr( $sValidationResponse , 'This is a valid RSS feed' ) !== false )
        {
            return true;
        }
        else
        {
            return false;
        }
    }
    else
    {
        return false;
    }
}

?>


来源:https://stackoverflow.com/questions/451338/validating-an-rss-feed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!