问题
I'm using the Tika facade, per the example of the elasticsearch-mappper-attachment plugin. Here's my test code:
Tika tika = new Tika();
Metadata md = new Metadata();
try {
String content = tika.parseToString(src, md, 100000);
System.out.println("Content length: " + content.length());
for (String s: md.names()) {
System.out.println(s + ": " + md.get(s));
}
}
catch (TikaException e) {
System.out.println(e);
}
Here's the output:
Content length: 0
X-Parsed-By: org.apache.tika.parser.EmptyParser
Content-Type: text/html
So the question is: if Tika correctly identifies the input as text/html
, why does it use the EmptyParser
? If I'm supposed to pass a parser, which parser should I pass for best results, assuming that autodetection is successful, as above.
Thank you.
回答1:
Make sure that tika-parsers
is on your classpath! If you are using Gradle,
compile 'org.apache.tika:tika-parsers:1.7'
will do the trick.
来源:https://stackoverflow.com/questions/28954805/why-does-the-tika-facade-choose-emptyparser