How can I know if a file is a binary file?
For example, compiled c file.
I want to read all files from some directory, but I want ignore binary files.
Perhaps this would suffice ..
if ! file /path/to/file | grep -iq ASCII ; then
echo "Binary"
fi
if file /path/to/file | grep -iq ASCII ; then
echo "Text file"
fi
It's kind of brute force to exclude binary files with tr -d "[[:print:]\n\t]" < file | wc -c
, but it is no heuristic guesswork either.
find . -type f -maxdepth 1 -exec /bin/sh -c '
for file in "$@"; do
if [ $(LC_ALL=C LANG=C tr -d "[[:print:]\n\t]" < "$file" | wc -c) -gt 0 ]; then
echo "${file} is no ASCII text file (UNIX)"
else
echo "${file} is ASCII text file (UNIX)"
fi
done
' _ '{}' +
The following brute-force approach using grep -a -m 1 $'[^[:print:]\t]' file
seems quite a bit faster, though.
find . -type f -maxdepth 1 -exec /bin/sh -c '
tab="$(printf "\t")"
for file in "$@"; do
if LC_ALL=C LANG=C grep -a -m 1 "[^[:print:]${tab}]" "$file" 1>/dev/null 2>&1; then
echo "${file} is no ASCII text file (UNIX)"
else
echo "${file} is ASCII text file (UNIX)"
fi
done
' _ '{}' +
Adapted from excluding binary file
find . -exec file {} \; | grep text | cut -d: -f1
Here is a simple solution to check for a single file using BSD grep (on macOS/Unix):
grep -q "\x00" file && echo Binary || echo Text
which basically checks if file consist NUL character.
Using this method, to read all non-binary files recursively using find
utility you can do:
find . -type f -exec sh -c 'grep -q "\x00" {} || cat {}' ";"
Or even simpler using just grep
:
grep -rv "\x00" .
For just current folder, use:
grep -v "\x00" *
Unfortunately the above examples won't work for GNU grep, however there is a workaround.
Since GNU grep
is ignoring NULL characters, it's possible to check for other non-ASCII characters like:
$ grep -P "[^\x00-\x7F]" file && echo Binary || echo Text
Note: It won't work for files containing only NULL characters.
cat
+grep
Assuming binary means the file containing NULL characters, this shell command can help:
(cat -v file.bin | grep -q "\^@") && echo Binary || echo Text
or:
grep -q "\^@" <(cat -v file.bin) && echo Binary
This is the workaround for grep -q "\x00", which works for BSD grep, but not for GNU version.
Basically -v
for cat
converts all non-printing characters so they are visible in form of control characters, for example:
$ printf "\x00\x00" | hexdump -C
00000000 00 00 |..|
$ printf "\x00\x00" | cat -v
^@^@
$ printf "\x00\x00" | cat -v | hexdump -C
00000000 5e 40 5e 40 |^@^@|
where ^@
characters represent NULL character. So once these control characters are found, we assume the file is binary.
The disadvantage of above method is that it could generate false positives when characters are not representing control characters. For example:
$ printf "\x00\x00^@^@" | cat -v | hexdump -C
00000000 5e 40 5e 40 5e 40 5e 40 |^@^@^@^@|
See also: How do I grep for all non-ASCII characters.
perl -E 'exit((-B $ARGV[0])?0:1);' file-to-test
Could be used to check whenever "file-to-test" is binary. The above command will exit wit code 0 on binary files, otherwise the exit code would be 1.
The reverse check for text file can look like the following command:
perl -E 'exit((-T $ARGV[0])?0:1);' file-to-test
Likewise the above command will exit with status 0 if the "file-to-test" is text (not binary).
Read more about the -B
and -T
checks using command perldoc -f -X
.