Determining binary/text file type in Java?

前端 未结 10 1089
心在旅途
心在旅途 2020-12-02 16:46

Namely, how would you tell an archive (jar/rar/etc.) file from a textual (xml/txt, encoding-independent) one?

相关标签:
10条回答
  • 2020-12-02 17:22

    Have a look at the JMimeMagic library.

    jMimeMagic is a Java library for determining the MIME type of files or streams.

    0 讨论(0)
  • 2020-12-02 17:23

    I made this one. A bit simpler, but for latin-based languages, it should work fine, with the ratio adjustment.

    /**
     *  Guess whether given file is binary. Just checks for anything under 0x09.
     */
    public static boolean isBinaryFile(File f) throws FileNotFoundException, IOException {
        FileInputStream in = new FileInputStream(f);
        int size = in.available();
        if(size > 1024) size = 1024;
        byte[] data = new byte[size];
        in.read(data);
        in.close();
    
        int ascii = 0;
        int other = 0;
    
        for(int i = 0; i < data.length; i++) {
            byte b = data[i];
            if( b < 0x09 ) return true;
    
            if( b == 0x09 || b == 0x0A || b == 0x0C || b == 0x0D ) ascii++;
            else if( b >= 0x20  &&  b <= 0x7E ) ascii++;
            else other++;
        }
    
        if( other == 0 ) return false;
    
        return 100 * other / (ascii + other) > 95;
    }
    
    0 讨论(0)
  • 2020-12-02 17:24

    Using Java 7 Files class http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#probeContentType(java.nio.file.Path)

    boolean isBinaryFile(File f) throws IOException {
            String type = Files.probeContentType(f.toPath());
            if (type == null) {
                //type couldn't be determined, assume binary
                return true;
            } else if (type.startsWith("text")) {
                return false;
            } else {
                //type isn't text
                return true;
            }
        }
    
    0 讨论(0)
  • 2020-12-02 17:29

    Run file -bi {filename}. If whatever it returns starts with 'text/', then it's non-binary, otherwise it is. ;-)

    0 讨论(0)
  • 2020-12-02 17:30

    I used this code and it works for English and German text pretty well:

    private boolean isTextFile(String filePath) throws Exception {
        File f = new File(filePath);
        if(!f.exists())
            return false;
        FileInputStream in = new FileInputStream(f);
        int size = in.available();
        if(size > 1000)
            size = 1000;
        byte[] data = new byte[size];
        in.read(data);
        in.close();
        String s = new String(data, "ISO-8859-1");
        String s2 = s.replaceAll(
                "[a-zA-Z0-9ßöäü\\.\\*!\"§\\$\\%&/()=\\?@~'#:,;\\"+
                "+><\\|\\[\\]\\{\\}\\^°²³\\\\ \\n\\r\\t_\\-`´âêîô"+
                "ÂÊÔÎáéíóàèìòÁÉÍÓÀÈÌÒ©‰¢£¥€±¿»«¼½¾™ª]", "");
        // will delete all text signs
    
        double d = (double)(s.length() - s2.length()) / (double)(s.length());
        // percentage of text signs in the text
        return d > 0.95;
    }
    
    0 讨论(0)
  • 2020-12-02 17:30

    Just to let you know, I've chosen quite a different path. I my case, there are only 2 types of files, chances that any given file will be a binary one are high. So

    1. presume that file is binary, try doing what's supposed to be done (e.g. deserialize)
    2. catch exception
    3. treat file as textual
    4. if that fails, something is wrong with file itself
    0 讨论(0)
提交回复
热议问题