I was just wondering how most people fetch a mime type from a file in Java? So far I\'ve tried two utils: JMimeMagic
& Mime-Util
.
Th
From roseindia:
FileNameMap fileNameMap = URLConnection.getFileNameMap();
String mimeType = fileNameMap.getContentTypeFor("alert.gif");
If you are stuck with java 5-6 then this utility class from servoy open source product.
You only need this function
public static String getContentType(byte[] data, String name)
It probes the first bytes of the content and returns the content types based on that content and not by file extension.
With Apache Tika you need only three lines of code:
File file = new File("/path/to/file");
Tika tika = new Tika();
System.out.println(tika.detect(file));
If you have a groovy console, just paste and run this code to play with it:
@Grab('org.apache.tika:tika-core:1.14')
import org.apache.tika.Tika;
def tika = new Tika()
def file = new File("/path/to/file")
println tika.detect(file)
Keep in mind that its APIs are rich, it can parse "anything". As of tika-core 1.14, you have:
String detect(byte[] prefix)
String detect(byte[] prefix, String name)
String detect(File file)
String detect(InputStream stream)
String detect(InputStream stream, Metadata metadata)
String detect(InputStream stream, String name)
String detect(Path path)
String detect(String name)
String detect(URL url)
See the apidocs for more information.
Apache Tika offers in tika-core a mime type detection based based on magic markers in the stream prefix. tika-core
does not fetch other dependencies, which makes it as lightweight as the currently unmaintained Mime Type Detection Utility.
Simple code example (Java 7), using the variables theInputStream
and theFileName
try (InputStream is = theInputStream;
BufferedInputStream bis = new BufferedInputStream(is);) {
AutoDetectParser parser = new AutoDetectParser();
Detector detector = parser.getDetector();
Metadata md = new Metadata();
md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
MediaType mediaType = detector.detect(bis, md);
return mediaType.toString();
}
Please note that MediaType.detect(...)
cannot be used directly (TIKA-1120). More hints are provided at https://tika.apache.org/1.24/detection.html.
I couldn't find anything to check for video/mp4
MIME type so I made my own solution.
I happened to observe that Wikipedia was wrong and that the 00 00 00 18 66 74 79 70 69 73 6F 6D
file signature is not correct. the fourth byte (18
) and all 70
(excluded) after changes quite a lot amongst otherwise valid mp4
files.
This code is essentially a copy/paste of URLConnection.guessContentTypeFromStream
code but tailored to video/mp4
.
BufferedInputStream bis = new BufferedInputStream(new ByteArrayInputStream(content));
String mimeType = URLConnection.guessContentTypeFromStream(bis);
// Goes full barbaric and processes the bytes manually
if (mimeType == null){
// These ints converted in hex ar:
// 00 00 00 18 66 74 79 70 69 73 6F 6D
// which are the file signature (magic bytes) for .mp4 files
// from https://www.wikiwand.com/en/List_of_file_signatures
// just ctrl+f "mp4"
int[] mp4_sig = {0, 0, 0, 24, 102, 116, 121, 112};
bis.reset();
bis.mark(16);
int[] firstBytes = new int[8];
for (int i = 0; i < 8; i++) {
firstBytes[i] = bis.read();
}
// This byte doesn't matter for the file signature and changes
mp4_sig[3] = content[3];
bis.reset();
if (Arrays.equals(firstBytes, mp4_sig)){
mimeType = "video/mp4";
}
}
Tested successfully against 10 different .mp4
files.
EDIT: Here is a useful link (if it is still online) where you can find samples of many types. I don't own those videos, don't know who does either, but they're useful for testing the above code.
In Java 7 you can now just use Files.probeContentType(path).