Historical reason behind different line ending at different platforms

谁说我不能喝 提交于 2019-11-27 04:35:55

DOS inherited CR-LF line endings (what you're calling \r\n, just making the ascii characters explicit) from CP/M. CP/M inherited it from the various DEC operating systems which influenced CP/M designer Gary Kildall.

CR-LF was used so that the teletype machines would return the print head to the left margin (CR = carriage return), and then move to the next line (LF = line feed).

The Unix guys handled that in the device driver, and when necessary translated LF to CR-LF on output to devices that needed it.

And as you guessed, Mac OS X now uses LF.

Really adding to @Mark Harrison...

The people who tell you that Unix is "just outputting the text the programmer specified" whereas DOS is broken are plain wrong. There are also claims that it's stupid for DOS to flag EOF when it sees an EOF character, raising the question of what exactly that EOF character is for.

There is no one true convention for text file line endings - only platform-specific conventions. After all, even CR-LF, CR and LF aren't the only line end conventions to ever be used, and ASCII was never even the one and only character set. The problem is the C standard library and runtime, which didn't abstract away this platform-dependent detail. Other third generation languages (such as Pascal and even Basic) managed it, at least to some degree. Because of this, when C compilers were written for other platforms, runtime library hacks were needed to achieve compatibility with existing source code and books.

In fact, it's Unix and Multics that originally needed string translation for console I/O, since users usually sat at an ASCII terminal that required CR LF line ends. This translation was done in a device driver, though - the goal was to abstract away the device-specifics, assuming that it was better to adopt one convention and stick to it for stored text files.

The C text I/O hack is similar in principle to what CygWin does now, hacking Linux runtimes to work as well as can be expected on Windows. There's a real history of hacking things about to turn them into Unix-alikes - but then there's also Wine, turning Linux into Windows. Oddly enough, you can read some misplaced line-end criticism of Windows in the CygWin FAQ (Internet Archive link added 2013 - the page no longer exists). Maybe it's just their sense of humour, since they are basically doing what they are criticising, but on a much grander scale ;-)

The C++ standard library (whatever platform its implemented on) avoids this issue using iostreams, which abstract away line ends. For output, that suits me fine. For input, I need more control, so I either interpret character-by-character or else use a scanner generator.

[EDIT It turns out that the struck-out claim above isn't true, and never was. The std::endl literally translates to a \n and a flush. The \n is exactly the same \n you get in C - it tends to get called "new line", but it's actually an ASCII line feed character, which then gets translated by the runtime if necessary. Funny how false assumptions can get so ingrained you never question them - basically, C++ had no choice to do what C did (other than adding more layers on top) for compatibility reasons, and that should always have been obvious.]

The biggest slice of blame from my POV is with C, but C isn't the only project to fail to anticipate its move to other platforms. Blaming Bill Gates is just nuts - all he did was buy and polish a variant of the then popular CP/M. Really, it's just history - the same reason why we don't know what character codes 128 to 255 refer to in most text files. Given the ease of coping with all three line end conventions, it's odd that some developers still insist on that "my platforms convention is the one true way, and I shall force it on you like it or not" attitude.

Also - will the Unicode line separator codepoint U+2028 replace all these conventions in future text files? ;-)

There's a rather lengthy article about line endings on wikipedia. The "History" section answers at least part of your question: http://en.wikipedia.org/wiki/Newline#History

It's interesting to note the CRLF is pretty much the internet standard. That is, pretty much every standard internet protocol that is line oriented uses CRLF. SMTP, POP, IMAP, NNTP, etc.. The body of email consists of lines terminated by CRLF.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!