问题
Using the JAR
files installed through apt for Saxon-HE
and tagsoup
parsing html
is a one-liner as:
thufir@dur:~/saxon$
thufir@dur:~/saxon$ java -cp /usr/share/java/Saxon-HE-9.8.0.14.jar:/usr/share/java/tagsoup-1.2.1.jar net.sf.saxon.Query -x:org.ccil.cowan.tagsoup.Parser -qs:doc\(\'http://books.toscrape.com/\'\)
<?xml version="1.0" encoding="UTF-8"?><!--[if lt IE 7]> <html lang="en-us" class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--><!--[if IE 7]> <html lang="en-us" class="no-js lt-ie9 lt-ie8"> <![endif]--><!--[if IE 8]> <html lang="en-us" class="no-js lt-ie9"> <![endif]--><!--[if gt IE 8]><!--><html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" class="no-js" lang="en-us"><!--<![endif]--><head><title>
All products | Books to Scrape - Sandbox
..
<!-- Version: N/A -->
thufir@dur:~/saxon$
thufir@dur:~/saxon$
How would I do that from Java? In particular, what imports are required from Saxon
for this execution? Perhaps using Saxon
and the JAXP interface?
also:
http://codingwithpassion.blogspot.com/2011/03/saxon-xslt-java-example.html
回答1:
You will find many simple examples of invoking transformations using Saxon from Java in the saxon-resources download available on both the saxonica.com and sourceforge.net web sites.
It's difficult to know exactly what you want here, because your command line example isn't using Saxon to do anything useful other than invoking the TagSoup parser and serializing the result. The simplest way to do that from Java is with a JAXP identity transformation, which runs just as well with the built-in XSLT transformer in the JDK as with Saxon:
TransformerFactory factory = TransformerFactory.newInstance();
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(xmlReader, new InputSource("http://books.toscrape.com/"));
Result output = new StreamResult(System.out);
factory.newTransformer().transform(input, output);
If you want to add some XSLT or XQuery processing then of course that's perfectly possible (I would always use the s9api API for Saxon, but you can also use JAXP or XQJ), but the details depend on exactly what you want to do.
来源:https://stackoverflow.com/questions/54031200/hello-world-saxon-with-java