Python search a file for text using input from another file

后端 未结 5 515
梦如初夏
梦如初夏 2021-01-20 11:24

I\'m new to python and programming. I need some help with a python script. There are two files each containing email addresses (more than 5000 lines). Input file contains em

相关标签:
5条回答
  • 2021-01-20 11:41

    I think your issue stems from the following:

    name = fd.readline()
    if name[1:-1] in names:
    

    name[1:-1] slices each email address so that you skip the first and last characters. While it might be good in general to skip the last character (a newline '\n'), when you load the name database in the "dfile"

    with open(inputfile, 'r') as f:
        names = f.readlines()
    

    you are including newlines. So, don't slice the names in the "ifile" at all, i.e.

    if name in names:
    
    0 讨论(0)
  • 2021-01-20 11:53

    Here's what I would do:

    names=[]
    outputList=[]
    with open(inputfile) as f:
        for line in f:
            names.append(line.rstrip("\n")
    
    myEmails=set(names)
    
    with open(outputfile) as fd, open("emails.txt", "w") as output:
        for line in fd:
            for name in names:
                c=line.rstrip("\n")
                if name in myEmails:
                    print name #for console
                    output.write(name) #for writing to file
    
    0 讨论(0)
  • Maybe I'm missing something, but why not use a pair of sets?

    #!/usr/local/cpython-3.3/bin/python
    
    data_filename = 'dfile1.txt'
    input_filename = 'ifile1.txt'
    
    with open(input_filename, 'r') as input_file:
        input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())
    
    with open(data_filename, 'r') as data_file:
        data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())
    
    print(input_addresses.intersection(data_addresses))
    
    0 讨论(0)
  • 2021-01-20 11:55

    mitan8 gives the problem you have, but this is what I would do instead:

    with open(inputfile, "r") as f:
        names = set(i.strip() for i in f)
    
    output = []
    
    with open(datafile, "r") as f:
        for name in f:
            if name.strip() in names:
                print name
    

    This avoids reading the larger datafile into memory.

    If you want to write to an output file, you could do this for the second with statement:

    with open(datafile, "r") as i, open(outputfile, "w") as o:
        for name in i:
            if name.strip() in names:
                o.write(name)
    
    0 讨论(0)
  • 2021-01-20 12:06

    I think you can remove name = fd.readline() since you've already got the line in the for loop. It'll read another line in addition to the for loop, which reads one line every time. Also, I think name[1:-1] should be name, since you don't want to strip the first and last character when searching. with automatically closes the files opened.

    PS: How I'd do it:

    with open("dfile1") as dfile, open("ifile") as ifile:
        lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
    print(lines)
    with open("ofile", "w") as ofile:
        ofile.write(lines)
    

    In the above solution, basically I'm taking the union (elements part of both sets) of the lines of both the files to find the common lines.

    0 讨论(0)
提交回复
热议问题