How to tell binary from text files in linux

后端未结

关注

 8  1039

The linux file command does a very good job in recognising file types and gives very fine-grained results. The diff tool is able to tell binary files f

相关标签:

8条回答

陌清茗

2021-02-07 14:27

A quick-and-dirty way is to look for a NUL character (a zero byte) in the first K or two of the file. As long as you're not worried about UTF-16 or UTF-32, no text file should ever contain a NUL.

Update: According to the diff manual, this is exactly what diff does.

0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-02-07 14:36
You could try to give a
```
strings yourfile
```
command and compare the size of the results with the file size ... i'm not totally sure, but if they are the same the file is really a text file.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2021-02-07 14:36
This approach uses same criteria as grep in determining whether a file is binary or text:
```
is_text_file() { 
  grep -qI '.' "$1"
}
```
grep options used:
- -q Quiet; Exit immediately with zero status if any match is found
- -I Process a binary file as if it did not contain matching data
grep pattern used:
- '.' match any single character. All files (except an empty file) will match this pattern.
Notes
- An empty file is not considered a text file according to this test.
- Symbolic links are followed.
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2021-02-07 14:36

Commands like less, grep detect it quite easily(and fast). You can have a look at their source.

0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2021-02-07 14:42

file is still the command you want. Any file that is text (according to its heuristics) will include the word "text" in the output of file; anything that is binary will not include the word "text".

If you don't agree with the heuristics that file uses to determine text vs. not-text, then the question needs to be better specified, since text vs. non-text is an inherently vague question. For example, file does not identify a PGP public key block in ASCII as "text", but you might (since it is composed only of printable characters, even though it is not human-readable).

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-02-07 14:47

These days the term "text file" is ambiguous, because a text file can be encoded in ASCII, ISO-8859-*, UTF-8, UTF-16, UTF-32 and so on.

See here for how Subversion does it.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

How to tell binary from text files in linux

grep options used:

grep pattern used:

Notes