Skip first couple of lines while reading lines in Python file

I want to skip the first 17 lines while reading a text file.

Let's say the file looks like:

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
good stuff

I just want the good stuff. What I'm doing is a lot more complicated, but this is the part I'm having trouble with.

wim

Use a slice, like below:

with open('yourfile.txt') as f:
    lines_after_17 = f.readlines()[17:]

If the file is too big to load in memory:

with open('yourfile.txt') as f:
    for _ in range(17):
        next(f)
    for line in f:
        # do stuff

Ismail Badawi

Use itertools.islice, starting at index 17. It will automatically skip the 17 first lines.

import itertools
with open('file.txt') as f:
    for line in itertools.islice(f, 17, None):  # start=17, stop=None
        # process lines

for line in dropwhile(isBadLine, lines):
    # process as you see fit

Full demo:

from itertools import *

def isBadLine(line):
    return line=='0'

with open(...) as f:
    for line in dropwhile(isBadLine, f):
        # process as you see fit

Advantages: This is easily extensible to cases where your prefix lines are more complicated than "0" (but not interdependent).

Wilder

This solution helped me to skip the number of lines specified by the linetostart variable. You get the index (int) and the line (string) if you want to keep track of those too. In your case, you substitute linetostart with 18, or assign 18 to linetostart variable.

f = open("file.txt", 'r')
for i, line in enumerate(f, linetostart):
    #Your code

Here is a method to get lines between two line numbers in a file:

import sys

def file_line(name,start=1,end=sys.maxint):
    lc=0
    with open(s) as f:
        for line in f:
            lc+=1
            if lc>=start and lc<=end:
                yield line


s='/usr/share/dict/words'
l1=list(file_line(s,235880))
l2=list(file_line(s,1,10))
print l1
print l2

Output:

['Zyrian\n', 'Zyryan\n', 'zythem\n', 'Zythia\n', 'zythum\n', 'Zyzomys\n', 'Zyzzogeton\n']
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']

Just call it with one parameter to get from line n -> EOF

If it's a table.

pd.read_table("path/to/file", sep="\t", index_col=0, skiprows=17)

If you don't want to read the whole file into memory at once, you can use a few tricks:

With next(iterator) you can advance to the next line:

with open("filename.txt") as f:
     next(f)
     next(f)
     next(f)
     for line in f:
         print(f)

Of course, this is slighly ugly, so itertools has a better way of doing this:

from itertools import islice

with open("filename.txt") as f:
    # start at line 17 and never stop (None), until the end
    for line in islice(f, 17, None):
         print(f)

Here are the timeit results for the top 2 answers. Note that "file.txt" is a text file containing 100,000+ lines of random string with a file size of 1MB+.

Using itertools:

import itertools
from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for line in itertools.islice(fo, 90000, None):
        line.strip()""", number=100)

>>> 1.604976346003241

Using two for loops:

from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for i in range(90000):
        next(fo)
    for j in fo:
        j.strip()""", number=100)

>>> 2.427317383000627

clearly the itertools method is more efficient when dealing with large files.

You can use a List-Comprehension to make it a one-liner: