Where is hex code of the “EOF” character?

♀尐吖头ヾ 提交于 2019-11-28 21:04:59

There is no such thing as a EOF character. The operating system knows exactly how many bytes a file contains (this is stored alongside other metadata like permissions, creation date, and the name), and hence can tell programs that try to read the eleventh byte of a ten byte file: You've reached the end of file, there are no more bytes to read.

In fact, the "EOF" value returned for example by C functions like getchar is explicitly an int value outside the range of a byte, so it cannot possibly be stored in a file!

Sometimes, certain file formats insist on adding NUL terminators (probably because that's how strings are usually stored in C), though usually these delimit multiple records in a single file, not the file as a whole. And such decoration usually disqualifies a file from being considered a "text file".

ASCII codes like ETX and NUL date back to the days of teletypewriters and friends. NUL is used in C for in-memory strings, but this has no bearing on file systems.

There was - a long long time ago - an End Of File marker but it hasn't been used in files for many years.

You can demonstrate a distant echo of it on windows using:

C:\>copy con junk.txt
Hello
Hello again
- Press <Ctrl> and <z>
C:\>dump junk.txt
junk.txt:
00000000  4865 6c6c 6f0d 0a48 656c 6c6f 2061 6761 Hello..Hello aga
00000010  696e 0d0a                               in..
C:\>

Note the use of Ctrl-Z as an EOT marker.

However, notice also that the Ctrl-Z does not appear in the file any more - it used to appear as a 0x1a but only on some operating systems and even then not consistently.

Use of ETX (0x03) stopped even before those dim and distant times.

There is no such thing as EOF. EOF is just a value returned by file reading functions to tell you the file pointer reached the end of the file.

kralyk

The EOT byte (0x04) is used to this day by unix tty terminals to indicate end of input. You type it with a Ctrl + D (ie. ^D) to end input to shells or any other program reading from stdin.

However, as others have pointed out, this is distinct from EOF, which is a condition rather than a piece of data per se.

There once were even different EOF characters (for different operating systems). No longer seen one. (Typically files were in blocks of 128 bytes.) For coding a PITA, like nowadays BOMs.

Instead there is still a int read() that normally delivers a byte value, but for EOF delivers -1.

The NUL character is a string terminator in C. In java you can have a NUL character in the middle of a string. To be cooperative with C, the UTF-8 bytes generated use a multi-byte encoding both for Unicode characters > 127 and for NUL.

(Some of this is probably known already.)

In the 7bit Wintel world it is 0x1A or chr(26).

It is still commonly found in older text files and archives and is still produced by some file transmission protocols. In particular text files downloaded from BBS systems were commonly terminated with this character.

There are other such sentinel values for older systems, and like EOL (CR,LF,CR+LF) needs to be anticipated from time to time.

It can be a source of annoyance to see it still being used, on the same level as return(0) for instance.

You need the end of file character in certain instances for example sending a file to a printer from a Unix computer. Most windows/dos enabled printers expect the end-of-file marker to print the file stored in their memories. If no end-of-file marker is sent, the printer just sits until it times out (normally 2 minutes) and then prints the file. If you use lpr to print from Unix, you should make sure to include the end-of-file marker. Windows/dos attach it automatically and the printers are designed to wait fot it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!