Why is it faster to read a file without line breaks?

后端 未结 3 1594
借酒劲吻你
借酒劲吻你 2021-02-19 15:47

In Python 3.6, it takes longer to read a file if there are line breaks. If I have two files, one with line breaks and one without lines breaks (but otherwise they have the same

3条回答
  •  无人及你
    2021-02-19 16:32

    However, I would expect all characters to be treated the same.

    Well, they're not. Line breaks are special.

    Line breaks aren't always represented as \n. The reasons are a long story dating back to the early days of physical teleprinters, which I won't go into here, but where that story has ended up is that Windows uses \r\n, Unix uses \n, and classic Mac OS used to use \r.

    If you open a file in text mode, the line breaks used by the file will be translated to \n when you read them, and \n will be translated to your OS's line break convention when you write. In most programming languages, this is handled on the fly by OS-level code and pretty cheap, but Python does things differently.

    Python has a feature called universal newlines, where it tries to handle all line break conventions, no matter what OS you're on. Even if a file contains a mix of \r, \n, and \r\n line breaks, Python will recognize all of them and translate them to \n. Universal newlines is on by default in Python 3 unless you configure a specific line ending convention with the newline argument to open.

    In universal newlines mode, the file implementation has to read the file in binary mode, check the contents for \r\n characters, and

    construct a new string object with line endings translated

    if it finds \r or \r\n line endings. If it only finds \n endings, or if it finds no line endings at all, it doesn't need to perform the translation pass or construct a new string object.

    Constructing a new string and translating line endings takes time. Reading the file with the tabs, Python doesn't have to perform the translation.

提交回复
热议问题