What is the difference between :
with open(\"file.txt\", \"r\") as f:
data = list(f)
Or :
with open(\"file.txt\", \"r\") as
TL;DR;
Considering you need a list to manipulate them afterwards, your three proposed solutions are all syntactically valid. There is no better (or more pythonic) solution, especially since they all are recommended by the official Python documentation. So, choose the one you find the most readable and be consistent with it throughout your code. If performance is a deciding factor, see my timeit
analysis below.
Here is the timeit
(10000 loops, ~20 line in test.txt
),
import timeit
def foo():
with open("test.txt", "r") as f:
data = list(f)
def foo1():
with open("test.txt", "r") as f:
data = f.read().splitlines(True)
def foo2():
with open("test.txt", "r") as f:
data = f.readlines()
print(timeit.timeit(stmt=foo, number=10000))
print(timeit.timeit(stmt=foo1, number=10000))
print(timeit.timeit(stmt=foo2, number=10000))
>>>> 1.6370758459997887
>>>> 1.410844805999659
>>>> 1.8176437409965729
I tried it with multiple number of loops and lines, and f.read().splitlines(True)
always seems to be performing a bit better than the two others.
Now, syntactically speaking, all of your examples seems to be valid. Refer to this documentation for more informations.
According to it, if your goal is to read lines form a file,
for line in f:
...
where they states that it is memory efficient, fast, and leads to simple code. Which would be another good alternative in your case if you don't need to manipulate them in a list.
EDIT
Note that you don't need to pass your True
boolean to splitlines
. It has your wanted behavior by default.
My personal recommendation
I don't want to make this answer too opinion-based, but I think it would be beneficial for you to know, that I don't think performance should be your deciding factor until it is actually a problem for you. Especially since all syntax are allowed and recommended in the official Python doc I linked.
So, my advice is,:
First, pick the most logical one for your particular case and then choose the one you find the most readable and be consistent with it throughout your code.
Explicit is better than implicit, so I prefer:
with open("file.txt", "r") as f:
data = f.readlines()
But, when it is possible, the most pythonic is to use the file iterator directly, without loading all the content to memory, e.g.:
with open("file.txt", "r") as f:
for line in f:
my_function(line)
All three of your options produce the same end result, but nonetheless, one of them is definitely worse than the other two: doing f.read().splitlines(True)
.
The reason this is the worst option is that it requires the most memory. f.read()
reads the file content into memory as a single (maybe huge) string object, then calling .splitlines(True)
on that additionally creates the list of the individual lines, and then only after that does the string object containing the file's entire content get garbage collected and its memory freed. So, at the moment of peak memory use - just before the memory for the big string is freed - this approach requires enough memory to store the entire content of the file in memory twice - once as a string, and once as an array of strings.
By contrast, doing list(f)
or f.readlines()
will read a line from disk, add it to the result list, then read the next line, and so on. So the whole file content is never duplicated in memory, and the peak memory use will thus be about half that of the .splitlines(True)
approach. These approaches are thus superior to using .read()
and .splitlines(True)
.
As for list(f)
vs f.readlines()
, there's no concrete advantage to either of them over the other; the choice between them is a matter of style and taste.
In the 3 cases, you're using a context manager
to read a file. This file is a file object
.
File Object
An object exposing a file-oriented API (with methods such as read() or write()). Depending on the way it was created, a file object can mediate access to a real on-disk file or to another type of storage or communication device (for example standard input/output, in-memory buffers, sockets, pipes, etc.). File objects are also called file-like objects or streams. The canonical way to create a file object is by using the open() function. https://docs.python.org/3/glossary.html#term-file-object
list
with open("file.txt", "r") as f:
data = list(f)
This works because your file object is a stream like object. converting to list works roughly like this :
[element for element in generator until I hit stopIteration]
readlines method
with open("file.txt", "r") as f:
data = f.readlines()
The method readlines() reads until EOF using readline() and returns a list containing the lines.
Difference with list :
You can specify the number of elements you want to read : fileObject.readlines( sizehint )
If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read.
read
When should I ever use file.read() or file.readlines()?
They're all achieving the same goal of returning a list of strings but using separate approaches. f.readlines()
is the most Pythonic.
with open("file.txt", "r") as f:
data = list(f)
f
here is a file-like object, which is being iterated over through list
, which returns lines in the file.
with open("file.txt", "r") as f:
data = f.read().splitlines(True)
f.read()
returns a string, which you split on newlines, returning a list of strings.
with open("file.txt", "r") as f:
data = f.readlines()
f.readlines()
does the same as above, it reads the entire file and splits on newlines.