How can I know if a file is a binary file?
For example, compiled c file.
I want to read all files from some directory, but I want ignore binary files.
Try the following command-line:
file "$FILE" | grep -vq 'ASCII' && echo "$FILE is binary"
You can do this also by leveraging the diff
command. Check this answer:
https://unix.stackexchange.com/questions/275516/is-there-a-convenient-way-to-classify-files-as-binary-or-text#answer-402870
grep
Assuming binary means file containing non-printable characters (excluding blank characters such as spaces, tabs or new line characters), this may work (both BSD and GNU):
$ grep '[^[:print:][:blank:]]' file && echo Binary || echo Text
Note: GNU grep will report file containing only NULL characters as text, but it would work correctly on BSD version.
For more examples, see: How do I grep for all non-ASCII characters.
Use utility file
, sample usage:
$ file /bin/bash
/bin/bash: Mach-O universal binary with 2 architectures
/bin/bash (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/bash (for architecture i386): Mach-O executable i386
$ file /etc/passwd
/etc/passwd: ASCII English text
$ file code.c
code.c: ASCII c program text
file manual page
Use Perl’s built-in -T
file test operator, preferably after ascertaining that it is a plain file using the -f
file test operator:
$ perl -le 'for (@ARGV) { print if -f && -T }' \
getwinsz.c a.out /etc/termcap /bin /bin/cat \
/dev/tty /usr/share/zoneinfo/UTC /etc/motd
getwinsz.c
/etc/termcap
/etc/motd
Here’s the complement of that set:
$ perl -le 'for (@ARGV) { print unless -f && -T }' \
getwinsz.c a.out /etc/termcap /bin /bin/cat \
/dev/tty /usr/share/zoneinfo/UTC /etc/motd
a.out
/bin
/bin/cat
/dev/tty
/usr/share/zoneinfo/UTC
Going off Bach's suggestion, I think --mime-encoding
is the best flag to get something reliable from file.
file --mime-encoding [FILES ...] | grep -v '\bbinary$'
will print the files that file
believes have a non-binary encoding. You can pipe this output through cut -d: -f1
to trim the : encoding
if you just want the filenames.
Caveat: as @yugr reports below .doc
files report an encoding of application/mswordbinary
. This looks to me like a bug - the mime type is erroneously being concatenated with the encoding.
$ for flag in --mime --mime-type --mime-encoding; do
echo "$flag"
file "$flag" /tmp/example.{doc{,x},png,txt}
done
--mime
/tmp/example.doc: application/msword; charset=binary
/tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary
/tmp/example.png: image/png; charset=binary
/tmp/example.txt: text/plain; charset=us-ascii
--mime-type
/tmp/example.doc: application/msword
/tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document
/tmp/example.png: image/png
/tmp/example.txt: text/plain
--mime-encoding
/tmp/example.doc: application/mswordbinary
/tmp/example.docx: binary
/tmp/example.png: binary
/tmp/example.txt: us-ascii