Remove lines that contain certain string

后端 未结 9 1314
予麋鹿
予麋鹿 2020-11-30 01:00

I\'m trying to read a text from a text file, read lines, delete lines that contain specific string (in this case \'bad\' and \'naughty\'). The code I wrote goes like this:

相关标签:
9条回答
  • 2020-11-30 01:31

    Today I needed to accomplish a similar task so I wrote up a gist to accomplish the task based on some research I did. I hope that someone will find this useful!

    import os
    
    os.system('cls' if os.name == 'nt' else 'clear')
    
    oldfile = raw_input('{*} Enter the file (with extension) you would like to strip domains from: ')
    newfile = raw_input('{*} Enter the name of the file (with extension) you would like me to save: ')
    
    emailDomains = ['windstream.net', 'mail.com', 'google.com', 'web.de', 'email', 'yandex.ru', 'ymail', 'mail.eu', 'mail.bg', 'comcast.net', 'yahoo', 'Yahoo', 'gmail', 'Gmail', 'GMAIL', 'hotmail', 'comcast', 'bellsouth.net', 'verizon.net', 'att.net', 'roadrunner.com', 'charter.net', 'mail.ru', '@live', 'icloud', '@aol', 'facebook', 'outlook', 'myspace', 'rocketmail']
    
    print "\n[*] This script will remove records that contain the following strings: \n\n", emailDomains
    
    raw_input("\n[!] Press any key to start...\n")
    
    linecounter = 0
    
    with open(oldfile) as oFile, open(newfile, 'w') as nFile:
        for line in oFile:
            if not any(domain in line for domain in emailDomains):
                nFile.write(line)
                linecounter = linecounter + 1
                print '[*] - {%s} Writing verified record to %s ---{ %s' % (linecounter, newfile, line)
    
    print '[*] === COMPLETE === [*]'
    print '[*] %s was saved' % newfile
    print '[*] There are %s records in your saved file.' % linecounter
    

    Link to Gist: emailStripper.py

    Best, Az

    0 讨论(0)
  • 2020-11-30 01:33

    I have used this to remove unwanted words from text files:

    bad_words = ['abc', 'def', 'ghi', 'jkl']
    
    with open('List of words.txt') as badfile, open('Clean list of words.txt', 'w') as cleanfile:
        for line in badfile:
            clean = True
            for word in bad_words:
                if word in line:
                    clean = False
            if clean == True:
                cleanfile.write(line)
    

    Or to do the same for all files in a directory:

    import os
    
    bad_words = ['abc', 'def', 'ghi', 'jkl']
    
    for root, dirs, files in os.walk(".", topdown = True):
        for file in files:
            if '.txt' in file:
                with open(file) as filename, open('clean '+file, 'w') as cleanfile:
                    for line in filename:
                        clean = True
                        for word in bad_words:
                            if word in line:
                                clean = False
                        if clean == True:
                            cleanfile.write(line)
    

    I'm sure there must be a more elegant way to do it, but this did what I wanted it to.

    0 讨论(0)
  • 2020-11-30 01:35

    You can make your code simpler and more readable like this

    bad_words = ['bad', 'naughty']
    
    with open('oldfile.txt') as oldfile, open('newfile.txt', 'w') as newfile:
        for line in oldfile:
            if not any(bad_word in line for bad_word in bad_words):
                newfile.write(line)
    

    using a Context Manager and any.

    0 讨论(0)
提交回复
热议问题