Fastest Way to Delete a Line from Large File in Python

前端 未结 9 555
轮回少年
轮回少年 2020-11-30 03:58

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need

相关标签:
9条回答
  • 2020-11-30 04:33
    def removeLine(filename, lineno):
        in = open(filename)
        out = open(filename + ".new", "w")
        for i, l in enumerate(in, 1):
            if i != lineno:
                out.write(l)
        in.close()
        out.close()
        os.rename(filename + ".new", filename)
    
    0 讨论(0)
  • 2020-11-30 04:37

    As far as I know, you can't just open a txt file with python and remove a line. You have to make a new file and move everything but that line to it. If you know the specific line, then you would do something like this:

    f = open('in.txt')
    fo = open('out.txt','w')
    
    ind = 1
    for line in f:
        if ind != linenumtoremove:
            fo.write(line)
        ind += 1
    
    f.close()
    fo.close()
    

    You could of course check the contents of the line instead to determine if you want to keep it or not. I also recommend that if you have a whole list of lines to be removed/changed to do all those changes in one pass through the file.

    0 讨论(0)
  • 2020-11-30 04:41

    If the lines are variable length then I don't believe that there is a better algorithm than reading the file line by line and writing out all lines, except for the one(s) that you do not want.

    You can identify these lines by checking some criteria, or by keeping a running tally of lines read and suppressing the writing of the line(s) that you do not want.

    If the lines are fixed length and you want to delete specific line numbers, then you may be able to use seek to move the file pointer... I doubt you're that lucky though.

    0 讨论(0)
  • 2020-11-30 04:42

    Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing

    import os
    from mmap import mmap
    def removeLine(filename, lineno):
        f=os.open(filename, os.O_RDWR)
        m=mmap(f,0)
        p=0
        for i in range(lineno-1):
            p=m.find('\n',p)+1
        q=m.find('\n',p)
        m[p:q] = ' '*(q-p)
        os.close(f)
    

    If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop

    0 讨论(0)
  • 2020-11-30 04:43

    @OP, if you can use awk, eg assuming line number is 10

    $ awk 'NR!=10' file > newfile
    
    0 讨论(0)
  • 2020-11-30 04:49

    You can have two file objects for the same file at the same time (one for reading, one for writing):

    def removeLine(filename, lineno):
        fro = open(filename, "rb")
    
        current_line = 0
        while current_line < lineno:
            fro.readline()
            current_line += 1
    
        seekpoint = fro.tell()
        frw = open(filename, "r+b")
        frw.seek(seekpoint, 0)
    
        # read the line we want to discard
        fro.readline()
    
        # now move the rest of the lines in the file 
        # one line back 
        chars = fro.readline()
        while chars:
            frw.writelines(chars)
            chars = fro.readline()
    
        fro.close()
        frw.truncate()
        frw.close()
    
    0 讨论(0)
提交回复
热议问题