How to check if the file is a binary file and read all the files which are not?

后端 未结 13 865
走了就别回头了
走了就别回头了 2020-12-05 16:58

How can I know if a file is a binary file?

For example, compiled c file.

I want to read all files from some directory, but I want ignore binary files.

相关标签:
13条回答
  • 2020-12-05 17:36

    Try the following command-line:

    file "$FILE" | grep -vq 'ASCII' && echo "$FILE is binary"
    
    0 讨论(0)
  • 2020-12-05 17:36

    You can do this also by leveraging the diff command. Check this answer:

    https://unix.stackexchange.com/questions/275516/is-there-a-convenient-way-to-classify-files-as-binary-or-text#answer-402870

    0 讨论(0)
  • 2020-12-05 17:38

    grep

    Assuming binary means file containing non-printable characters (excluding blank characters such as spaces, tabs or new line characters), this may work (both BSD and GNU):

    $ grep '[^[:print:][:blank:]]' file && echo Binary || echo Text
    

    Note: GNU grep will report file containing only NULL characters as text, but it would work correctly on BSD version.

    For more examples, see: How do I grep for all non-ASCII characters.

    0 讨论(0)
  • 2020-12-05 17:39

    Use utility file, sample usage:

     $ file /bin/bash
     /bin/bash: Mach-O universal binary with 2 architectures
     /bin/bash (for architecture x86_64):   Mach-O 64-bit executable x86_64
     /bin/bash (for architecture i386): Mach-O executable i386
    
     $ file /etc/passwd
     /etc/passwd: ASCII English text
    
     $ file code.c
     code.c: ASCII c program text
    

    file manual page

    0 讨论(0)
  • 2020-12-05 17:44

    Use Perl’s built-in -T file test operator, preferably after ascertaining that it is a plain file using the -f file test operator:

    $ perl -le 'for (@ARGV) { print if -f && -T }' \
        getwinsz.c a.out /etc/termcap /bin /bin/cat \
        /dev/tty /usr/share/zoneinfo/UTC /etc/motd
    getwinsz.c
    /etc/termcap
    /etc/motd
    

    Here’s the complement of that set:

    $ perl -le 'for (@ARGV) { print unless -f && -T }' \
        getwinsz.c a.out /etc/termcap /bin /bin/cat \
        /dev/tty /usr/share/zoneinfo/UTC /etc/motd
    a.out
    /bin
    /bin/cat
    /dev/tty
    /usr/share/zoneinfo/UTC
    
    0 讨论(0)
  • 2020-12-05 17:48

    Going off Bach's suggestion, I think --mime-encoding is the best flag to get something reliable from file.

    file --mime-encoding [FILES ...] | grep -v '\bbinary$'
    

    will print the files that file believes have a non-binary encoding. You can pipe this output through cut -d: -f1 to trim the : encoding if you just want the filenames.


    Caveat: as @yugr reports below .doc files report an encoding of application/mswordbinary. This looks to me like a bug - the mime type is erroneously being concatenated with the encoding.

    $ for flag in --mime --mime-type --mime-encoding; do
        echo "$flag"
        file "$flag" /tmp/example.{doc{,x},png,txt}
      done
    --mime
    /tmp/example.doc:  application/msword; charset=binary
    /tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary
    /tmp/example.png:  image/png; charset=binary
    /tmp/example.txt:  text/plain; charset=us-ascii
    --mime-type
    /tmp/example.doc:  application/msword
    /tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document
    /tmp/example.png:  image/png
    /tmp/example.txt:  text/plain
    --mime-encoding
    /tmp/example.doc:  application/mswordbinary
    /tmp/example.docx: binary
    /tmp/example.png:  binary
    /tmp/example.txt:  us-ascii
    
    0 讨论(0)
提交回复
热议问题