In Python 2.7 why are strings written faster in text mode than in binary mode?

问题

The following example script writes some strings to a file using either "w", text, or "wb", binary mode:

import itertools as it
from string import ascii_lowercase
import time

characters = it.cycle(ascii_lowercase)
mode = 'w'
# mode = 'wb'  # using this mode takes longer to execute
t1 = time.clock()
with open('test.txt', mode) as fh:
    for __ in xrange(10**7):
        fh.write(''.join(it.islice(characters, 0, 50)))
t2 = time.clock()
print 'Mode: {}, time elapsed: {:.2f}'.format(mode, t2 - t1)

With Python 2, using "w" mode I found it executes in 24.89 +/- 0.02 s while using "wb" it takes 25.67 +/- 0.02 s to execute. These are the specific timings for three consecutive runs for each mode:

mode_w  = [24.91, 24.86, 24.91]
mode_wb = [25.68, 25.64, 25.69]

I'm surprised by these results since Python 2 stores its strings anyway as binary strings, so neither "w" nor "wb" need to perform any encoding work. Text mode on the other hand needs to perform additional work such as checking for line endings:

The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading.

So if anything I'd expect text mode "w" to take longer than binary mode "wb". However the opposite seems to be the case. Why is this?

^{Tested with CPython 2.7.12}

回答1:

Looking at the source code for file.write reveals the following difference between binary mode and text mode:

if (f->f_binary) {
    if (!PyArg_ParseTuple(args, "s*", &pbuf))
        return NULL;
    s = pbuf.buf;
    n = pbuf.len;
}
else {
    PyObject *text;
    if (!PyArg_ParseTuple(args, "O", &text))
        return NULL;

    if (PyString_Check(text)) {
        s = PyString_AS_STRING(text);
        n = PyString_GET_SIZE(text);
    }

Here f->f_binary is set when the mode for open includes "b". In this case Python constructs an auxiliary buffer object from the string object and then gets the data s and length n from that buffer. I suppose this is for compatibility (generality) with other objects that support the buffer interface.

Here PyArg_ParseTuple(args, "s*", &pbuf) creates the corresponding buffer object. This operation requires additional compute time while when working with text mode, Python simply parses the argument as an Object ("O") at almost no cost. Retrieving the data and length via

s = PyString_AS_STRING(text);
n = PyString_GET_SIZE(text);

is also performed when the buffer is created.

This means that when working in binary mode there's an additional overhead associated with creating an auxiliary buffer object from the string object. For that reason the execution time is longer when working in binary mode.

来源：https://stackoverflow.com/questions/62161186/in-python-2-7-why-are-strings-written-faster-in-text-mode-than-in-binary-mode

标签

python

python-2.7

file-io

cpython