How to reliably detect file types?

后端 未结 3 1405
心在旅途
心在旅途 2020-12-01 17:25

Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)

Consider the case of XML - Up until we ran into this issue, the follow

相关标签:
3条回答
  • 2020-12-01 17:31

    For those who do not need very precise detection (the Java 7's Files.probeContentType method mentioned by rjdkolb)

    Path filePath = Paths.get("/path/to/your/file.jpg");
    String contentType = Files.probeContentType(filePath);
    
    0 讨论(0)
  • 2020-12-01 17:43

    File type detection tools:

    • Mime Type Detection Utility
    • DROID (Digital Record Object Identification)
    • ftc - File Type Classifier
    • JHOVE, JHOVE2
    • NLNZ Metadata Extraction Tool
    • Apache Tika
    • TrID, TrIDNet
    • Oracle Outside In (commercial)
    • Forensic Innovations File Investigator TOOLS (commercial)
    0 讨论(0)
  • 2020-12-01 17:45

    Apache Tika gives me the least amount of issues and is not platform specific unlike Java 7 : Files.probeContentType

    import java.io.File;
    import java.io.IOException;
    import javax.activation.MimeType;
    import org.apache.tika.Tika;
    
    File inputFile = ...
    String type = new Tika().detect(inputFile);
    System.out.println(type);
    

    For a xml file I got 'application/xml'

    for a properties file I got 'text/plain'

    You can however add a Detector to the new Tika()

    <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-core</artifactId>
        <version>1.xx</version>
    </dependency>
    
    0 讨论(0)
提交回复
热议问题