python regex for repeating string

后端 未结 4 1498
南方客
南方客 2021-01-07 04:31

I am wanting to verify and then parse this string (in quotes):

string = \"start: c12354, c3456, 34526; other stuff that I don\'t care about\"
//Note that som         


        
相关标签:
4条回答
  • 2021-01-07 04:59

    This can be done (pretty elegantly) with a tool like Pyparsing:

    from pyparsing import Group, Literal, Optional, Word
    import string
    
    code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
    parser = Literal("start:") + OneOrMore(code) + Literal(";")
    # Read lines from file:
    with open('lines.txt', 'r') as f:
        for line in f:
            try:
                result = parser.parseString(line)
                codes = [c[1] for c in result[1:-1]]
                # Do something with teh codez...
            except ParseException exc:
                # Oh noes: string doesn't match!
                continue
    

    Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.

    0 讨论(0)
  • 2021-01-07 05:00

    In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

    Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).

    0 讨论(0)
  • 2021-01-07 05:08
    import re
    
    sstr = re.compile(r'start:([^;]*);')
    slst = re.compile(r'(?:c?)(\d+)')
    
    mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
    match = re.match(sstr, mystr)
    if match:
        res = re.findall(slst, match.group(0))
    

    results in

    ['12354', '3456', '34526']
    
    0 讨论(0)
  • 2021-01-07 05:23

    You could use the standard string tools, which are pretty much always more readable.

    s = "start: c12354, c3456, 34526;"

    s.startswith("start:") # returns a boolean if it starts with this string

    s.endswith(";") # returns a boolean if it ends with this string

    s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "

    0 讨论(0)
提交回复
热议问题