问题
While trying to come-up with a servlet based application to read files and manipulate them (image type conversion) here is a question that came up to me:
- Is it possible to inspect a file content and know the filetype?
- Is there a standard that specifies that each file MUST provide some type of marker in their content so that the application will not have to rely on the file extension constraints?
Consider an application scenario:
I am creating an application that will be able to convert different file formats to a set of output formats. Say user uploads an PDF, my application can suggest that the possible conversion formats are microsoft word or TIFF or JPEG etc.
As my application will gradually support different file formats (over a period of time), I want my application to inspect the input file instead of having the user to specify the format. And suggest to user the possible formats of output.
I understand this is an open ended, broad question. Please let me know if it needs to be modified.
Thanks, Ayusman
回答1:
Yeap you can figure out the type without an extension using the magic number. Also, the way the file command figures it out, is actually through a 3 step check:
- Check for filesystem properties to identifie empty files, folders, etc...
- The said magic number
- In text files, check for language in it
Here's a library that'll help you with Magic Numbers: jmimemagic
来源:https://stackoverflow.com/questions/10923258/how-to-know-file-type-without-extension