How to find the given string is a RSS feed or not

丶灬走出姿态 提交于 2019-12-25 07:31:20

问题


I have a string which takes both XML and HTML input from a data downloaded from the given Url. I want to check whether the downloaded string is an rss feed of a html document before parsing through SAXParser. How to find this?

For example

If I download a data from http://rss.cnn.com/rss/edition.rss the resulting string is a rss feed

If I download a data from http://edition.cnn.com/2014/06/19/opinion/iraq-neocons-wearing/index.html the resulting string is a html document.

I want to continue my process if only the string is an rss feed.


回答1:


RSS and HTML are both subsets of XML. So you can obtain your data as XML and validate it against RSS XSD. Like this.

URL schemaFile = new URL("http://europa.eu/rapid/conf/RSS20.xsd");
Source xmlFile = new StreamSource(YOUR_URL_HERE);
SchemaFactory schemaFactory = SchemaFactory
    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
try {
  validator.validate(xmlFile);
  // at this line you can be sure it's RSS 2.0 stream
} catch (SAXException e) {
  // NOT RSS
}

If you want to check namely String, you can check it for typical rss structure, like root element, required element in . But I won't recommend it.



来源:https://stackoverflow.com/questions/24322234/how-to-find-the-given-string-is-a-rss-feed-or-not

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!