问题
I have a string which takes both XML and HTML input from a data downloaded from the given Url. I want to check whether the downloaded string is an rss feed of a html document before parsing through SAXParser. How to find this?
For example
If I download a data from http://rss.cnn.com/rss/edition.rss the resulting string is a rss feed
If I download a data from http://edition.cnn.com/2014/06/19/opinion/iraq-neocons-wearing/index.html the resulting string is a html document.
I want to continue my process if only the string is an rss feed.
回答1:
RSS and HTML are both subsets of XML. So you can obtain your data as XML and validate it against RSS XSD. Like this.
URL schemaFile = new URL("http://europa.eu/rapid/conf/RSS20.xsd");
Source xmlFile = new StreamSource(YOUR_URL_HERE);
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
try {
validator.validate(xmlFile);
// at this line you can be sure it's RSS 2.0 stream
} catch (SAXException e) {
// NOT RSS
}
If you want to check namely String, you can check it for typical rss structure, like root element, required element in . But I won't recommend it.
来源:https://stackoverflow.com/questions/24322234/how-to-find-the-given-string-is-a-rss-feed-or-not