I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need
I will provide two alternatives based on the look-up factor (line number or a search string):
def removeLine2(filename, lineNumber):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
currentLineNumber = 0
while currentLineNumber < lineNumber:
inputFile.readline()
currentLineNumber += 1
seekPosition = inputFile.tell()
outputFile.seek(seekPosition, 0)
inputFile.readline()
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()
def removeLine(filename, key):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
seekPosition = 0
currentLine = inputFile.readline()
while not currentLine.strip().startswith('"%s"' % key):
seekPosition = inputFile.tell()
currentLine = inputFile.readline()
outputFile.seek(seekPosition, 0)
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()
Update: solution using sed as requested by poster in comment.
To delete for example the second line of file:
sed '2d' input.txt
Use the -i
switch to edit in place. Warning: this is a destructive operation. Read the help for this command for information on how to make a backup automatically.
I think there was a somewhat similar if not exactly the same type of question asked here. Reading (and writing) line by line is slow, but you can read a bigger chunk into memory at once, go through that line by line skipping lines you don't want, then writing this as a single chunk to a new file. Repeat until done. Finally replace the original file with the new file.
The thing to watch out for is when you read in a chunk, you need to deal with the last, potentially partial line you read, and prepend that into the next chunk you read.