How to tell binary from text files in linux

后端 未结 8 1006
[愿得一人]
[愿得一人] 2021-02-07 13:52

The linux file command does a very good job in recognising file types and gives very fine-grained results. The diff tool is able to tell binary files f

相关标签:
8条回答
  • 2021-02-07 14:27

    A quick-and-dirty way is to look for a NUL character (a zero byte) in the first K or two of the file. As long as you're not worried about UTF-16 or UTF-32, no text file should ever contain a NUL.

    Update: According to the diff manual, this is exactly what diff does.

    0 讨论(0)
  • 2021-02-07 14:36

    You could try to give a

    strings yourfile
    

    command and compare the size of the results with the file size ... i'm not totally sure, but if they are the same the file is really a text file.

    0 讨论(0)
  • 2021-02-07 14:36

    This approach uses same criteria as grep in determining whether a file is binary or text:

    is_text_file() { 
      grep -qI '.' "$1"
    }
    

    grep options used:

    • -q Quiet; Exit immediately with zero status if any match is found
    • -I Process a binary file as if it did not contain matching data

    grep pattern used:

    • '.' match any single character. All files (except an empty file) will match this pattern.

    Notes

    • An empty file is not considered a text file according to this test.
    • Symbolic links are followed.
    0 讨论(0)
  • 2021-02-07 14:36

    Commands like less, grep detect it quite easily(and fast). You can have a look at their source.

    0 讨论(0)
  • 2021-02-07 14:42

    file is still the command you want. Any file that is text (according to its heuristics) will include the word "text" in the output of file; anything that is binary will not include the word "text".

    If you don't agree with the heuristics that file uses to determine text vs. not-text, then the question needs to be better specified, since text vs. non-text is an inherently vague question. For example, file does not identify a PGP public key block in ASCII as "text", but you might (since it is composed only of printable characters, even though it is not human-readable).

    0 讨论(0)
  • 2021-02-07 14:47

    These days the term "text file" is ambiguous, because a text file can be encoded in ASCII, ISO-8859-*, UTF-8, UTF-16, UTF-32 and so on.

    See here for how Subversion does it.

    0 讨论(0)
提交回复
热议问题