How does Perl know a file is binary?

后端 未结 2 1802
名媛妹妹
名媛妹妹 2020-12-18 19:58

I know you can use the file test operator -B to test if a file is binary, but how does Perl implement this internally?

相关标签:
2条回答
  • 2020-12-18 20:04

    From perldoc -f -B:

    The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it’s a -B file; otherwise it’s a -T file. Also, any file containing null in the first block is considered a binary file. If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on a null file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in "next unless -f $file && -T $file".
    0 讨论(0)
  • 2020-12-18 20:12

    According to Chapter 11 of the book Learning Perl:

    The answer is **Perl cheats**: it opens the file, looks at the first few thousand bytes, and makes an educated guess. If it sees a lot of null bytes, unusual control characters, and bytes with the high bit set, then that looks like a binary file. If there’s not much weird stuff, then it looks like text. It sometimes guesses wrong. If a text file has a lot of Swedish or French words (which may have characters represented with the high bit set, as some ISO-8859-something variant, or perhaps even a Unicode version), it may fool Perl into declaring it binary. So it’s not perfect, but if you need to separate your source code from compiled files, or HTML files from PNGs, these tests should do the trick.
    0 讨论(0)
提交回复
热议问题