Remove lines that contain certain string

后端 未结 9 1313
予麋鹿
予麋鹿 2020-11-30 01:00

I\'m trying to read a text from a text file, read lines, delete lines that contain specific string (in this case \'bad\' and \'naughty\'). The code I wrote goes like this:

相关标签:
9条回答
  • 2020-11-30 01:10
    to_skip = ("bad", "naughty")
    out_handle = open("testout", "w")
    
    with open("testin", "r") as handle:
        for line in handle:
            if set(line.split(" ")).intersection(to_skip):
                continue
            out_handle.write(line)
    out_handle.close()
    
    0 讨论(0)
  • 2020-11-30 01:11

    Use python-textops package :

    from textops import *
    
    'oldfile.txt' | cat() | grepv('bad') | tofile('newfile.txt')
    
    0 讨论(0)
  • 2020-11-30 01:12

    You could simply not include the line into the new file instead of doing replace.

    for line in infile :
         if 'bad' not in line and 'naughty' not in line:
                newopen.write(line)
    
    0 讨论(0)
  • 2020-11-30 01:12
    bad_words = ['doc:', 'strickland:','\n']
    
    with open('linetest.txt') as oldfile, open('linetestnew.txt', 'w') as newfile:
        for line in oldfile:
            if not any(bad_word in line for bad_word in bad_words):
                newfile.write(line)
    

    The \n is a Unicode escape sequence for a newline.

    0 讨论(0)
  • 2020-11-30 01:13

    The else is only connected to the last if. You want elif:

    if 'bad' in line:
        pass
    elif 'naughty' in line:
        pass
    else:
        newopen.write(line)
    

    Also note that I removed the line substitution, as you don't write those lines anyway.

    0 讨论(0)
  • 2020-11-30 01:27

    Regex is a little quicker than the accepted answer (for my 23 MB test file) that I used. But there isn't a lot in it.

    import re
    
    bad_words = ['bad', 'naughty']
    
    regex = f"^.*(:{'|'.join(bad_words)}).*\n"
    subst = ""
    
    with open('oldfile.txt') as oldfile:
        lines = oldfile.read()
    
    result = re.sub(regex, subst, lines, re.MULTILINE) 
    
    with open('newfile.txt', 'w') as newfile:
        newfile.write(result)
    
    

    0 讨论(0)
提交回复
热议问题