How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

后端 未结 2 2049
无人共我
无人共我 2020-12-23 13:34

So essentially I\'m looking for specifically a 4 digit code within two angle brackets within a text file. I know that I need to open the text file and then parse line by lin

相关标签:
2条回答
  • 2020-12-23 14:12
    import re
    pattern = re.compile("<(\d{4,5})>")
    
    for i, line in enumerate(open('test.txt')):
        for match in re.finditer(pattern, line):
            print 'Found on line %s: %s' % (i+1, match.group())
    

    A couple of notes about the regex:

    • You don't need the ? at the end and the outer (...) if you don't want to match the number with the angle brackets, but only want the number itself
    • It matches either 4 or 5 digits between the angle brackets

    Update: It's important to understand that the match and capture in a regex can be quite different. The regex in my snippet above matches the pattern with angle brackets, but I ask to capture only the internal number, without the angle brackets.

    More about regex in python can be found here : Regular Expression HOWTO

    0 讨论(0)
  • 2020-12-23 14:18

    Doing it in one bulk read:

    import re
    
    textfile = open(filename, 'r')
    filetext = textfile.read()
    textfile.close()
    matches = re.findall("(<(\d{4,5})>)?", filetext)
    

    Line by line:

    import re
    
    textfile = open(filename, 'r')
    matches = []
    reg = re.compile("(<(\d{4,5})>)?")
    for line in textfile:
        matches += reg.findall(line)
    textfile.close()
    

    But again, the matches that returns will not be useful for anything except counting unless you added an offset counter:

    import re
    
    textfile = open(filename, 'r')
    matches = []
    offset = 0
    reg = re.compile("(<(\d{4,5})>)?")
    for line in textfile:
        matches += [(reg.findall(line),offset)]
        offset += len(line)
    textfile.close()
    

    But it still just makes more sense to read the whole file in at once.

    0 讨论(0)
提交回复
热议问题