Fastest Way to Delete a Line from Large File in Python

前端 未结 9 556
轮回少年
轮回少年 2020-11-30 03:58

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need

相关标签:
9条回答
  • 2020-11-30 04:51

    I will provide two alternatives based on the look-up factor (line number or a search string):

    Line number

    def removeLine2(filename, lineNumber):
        with open(filename, 'r+') as outputFile:
            with open(filename, 'r') as inputFile:
    
                currentLineNumber = 0 
                while currentLineNumber < lineNumber:
                    inputFile.readline()
                    currentLineNumber += 1
    
                seekPosition = inputFile.tell()
                outputFile.seek(seekPosition, 0)
    
                inputFile.readline()
    
                currentLine = inputFile.readline()
                while currentLine:
                    outputFile.writelines(currentLine)
                    currentLine = inputFile.readline()
    
            outputFile.truncate()
    

    String

    def removeLine(filename, key):
        with open(filename, 'r+') as outputFile:
            with open(filename, 'r') as inputFile:
                seekPosition = 0 
                currentLine = inputFile.readline()
                while not currentLine.strip().startswith('"%s"' % key):
                    seekPosition = inputFile.tell()
                    currentLine = inputFile.readline()
    
                outputFile.seek(seekPosition, 0)
    
                currentLine = inputFile.readline()
                while currentLine:
                    outputFile.writelines(currentLine)
                    currentLine = inputFile.readline()
    
            outputFile.truncate()
    
    0 讨论(0)
  • 2020-11-30 04:56

    Update: solution using sed as requested by poster in comment.

    To delete for example the second line of file:

    sed '2d' input.txt
    

    Use the -i switch to edit in place. Warning: this is a destructive operation. Read the help for this command for information on how to make a backup automatically.

    0 讨论(0)
  • 2020-11-30 04:57

    I think there was a somewhat similar if not exactly the same type of question asked here. Reading (and writing) line by line is slow, but you can read a bigger chunk into memory at once, go through that line by line skipping lines you don't want, then writing this as a single chunk to a new file. Repeat until done. Finally replace the original file with the new file.

    The thing to watch out for is when you read in a chunk, you need to deal with the last, potentially partial line you read, and prepend that into the next chunk you read.

    0 讨论(0)
提交回复
热议问题