I want to skip the first 17 lines while reading a text file.
Let's say the file looks like:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
good stuff
I just want the good stuff. What I'm doing is a lot more complicated, but this is the part I'm having trouble with.
Use a slice, like below:
with open('yourfile.txt') as f:
lines_after_17 = f.readlines()[17:]
If the file is too big to load in memory:
with open('yourfile.txt') as f:
for _ in range(17):
next(f)
for line in f:
# do stuff
Use itertools.islice
, starting at index 17. It will automatically skip the 17 first lines.
import itertools
with open('file.txt') as f:
for line in itertools.islice(f, 17, None): # start=17, stop=None
# process lines
for line in dropwhile(isBadLine, lines):
# process as you see fit
Full demo:
from itertools import *
def isBadLine(line):
return line=='0'
with open(...) as f:
for line in dropwhile(isBadLine, f):
# process as you see fit
Advantages: This is easily extensible to cases where your prefix lines are more complicated than "0" (but not interdependent).
This solution helped me to skip the number of lines specified by the linetostart
variable.
You get the index (int) and the line (string) if you want to keep track of those too.
In your case, you substitute linetostart with 18, or assign 18 to linetostart variable.
f = open("file.txt", 'r')
for i, line in enumerate(f, linetostart):
#Your code
Here is a method to get lines between two line numbers in a file:
import sys
def file_line(name,start=1,end=sys.maxint):
lc=0
with open(s) as f:
for line in f:
lc+=1
if lc>=start and lc<=end:
yield line
s='/usr/share/dict/words'
l1=list(file_line(s,235880))
l2=list(file_line(s,1,10))
print l1
print l2
Output:
['Zyrian\n', 'Zyryan\n', 'zythem\n', 'Zythia\n', 'zythum\n', 'Zyzomys\n', 'Zyzzogeton\n']
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']
Just call it with one parameter to get from line n -> EOF
If it's a table.
pd.read_table("path/to/file", sep="\t", index_col=0, skiprows=17)
If you don't want to read the whole file into memory at once, you can use a few tricks:
With next(iterator)
you can advance to the next line:
with open("filename.txt") as f:
next(f)
next(f)
next(f)
for line in f:
print(f)
Of course, this is slighly ugly, so itertools has a better way of doing this:
from itertools import islice
with open("filename.txt") as f:
# start at line 17 and never stop (None), until the end
for line in islice(f, 17, None):
print(f)
Here are the timeit results for the top 2 answers. Note that "file.txt" is a text file containing 100,000+ lines of random string with a file size of 1MB+.
Using itertools:
import itertools
from timeit import timeit
timeit("""with open("file.txt", "r") as fo:
for line in itertools.islice(fo, 90000, None):
line.strip()""", number=100)
>>> 1.604976346003241
Using two for loops:
from timeit import timeit
timeit("""with open("file.txt", "r") as fo:
for i in range(90000):
next(fo)
for j in fo:
j.strip()""", number=100)
>>> 2.427317383000627
clearly the itertools method is more efficient when dealing with large files.
You can use a List-Comprehension to make it a one-liner:
[fl.readline() for i in xrange(17)]
More about list comprehension in PEP 202 and in the Python documentation.
来源:https://stackoverflow.com/questions/9578580/skip-first-couple-of-lines-while-reading-lines-in-python-file