Correct use of Apache Tika MediaType

别等时光非礼了梦想. 提交于 2019-12-19 11:48:12

问题


I want to use APache Tika's MediaType class to compare mediaTypes.

I first use Tika to detect the MediaType. Then I want to start an action according to the MediaType.

So if the MediaType is from type XML I want to do some action, if it is a compressed file I want to start an other action.

My problem is that there are many XML types, so how do I check if it is an XML using the MediaType ?

Here is my previous (before Tika) implementation:

if (contentType.contains("text/xml") || 
    contentType.contains("application/xml") || 
    contentType.contains("application/x-xml") || 
    contentType.contains("application/atom+xml") || 
    contentType.contains("application/rss+xml")) {
        processXML();
}

else if (contentType.contains("application/gzip") || 
    contentType.contains("application/x-gzip") || 
    contentType.contains("application/x-gunzip") || 
    contentType.contains("application/gzipped") || 
    contentType.contains("application/gzip-compressed") || 
    contentType.contains("application/x-compress") || 
    contentType.contains("gzip/document") || 
    contentType.contains("application/octet-stream")) {
        processGzip();
}

I want to switch it to use Tika something like the following:

MediaType mediaType = MediaType.parse(contentType);
if (mediaType == APPLICATION_XML) {
    return processXml();
} else if (mediaType == APPLICATION_ZIP || mediaType == OCTET_STREAM) {
    return processGzip();
}

But the problem is that Tika.detect(...) returns many different types which don't have a MediaType constant.

How can I just identify the MediaType if it is type XML ? Or if it is type Compress ? I need a "Father" type which includes all of it's childs, maybe a method which is: "boolean isXML()" which includes application/xml and text/xml and application/x-xml or "boolean isCompress()" which includes all of the zip + gzip types etc


回答1:


What you'll need to do is walk the types hierarchy, until you either find what you want, or run out of things to check. That can be done with recursion, or could be done with a loop

The key method you need is MediaTypeRegistry.getSupertype(MediaType)

Your code would want to be something like:

// Define your media type constants here
MediaType FOO = MediaType.parse("application/foo");

// Work out the file's type
MediaType type = detector.detect(stream, metadata);

// Is it one we want in the tree?
while (type != null && !type.equals(MediaType.OCTET_STREAM)) {
   if (type.equals(MediaType.Application_XML)) {
       doThingForXML();
   } else if (type.equals(MediaType.APPLICATION_ZIP)) { 
       doThingForZip();
   } else if (type.equals(FOO)) {
       doThingForFoo();
   } else {
       // Check parent
       type = registry.getSuperType(type);
   }
}


来源:https://stackoverflow.com/questions/23179355/correct-use-of-apache-tika-mediatype

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!