using python to parse colon (:) delimited string to an object

前端 未结 4 1798
臣服心动
臣服心动 2021-01-20 16:29

i have a string which is a return value of REST API (http://requesttracker.wikia.com/wiki/REST) and is using colon seperated key/value pairs.

id: 123414
nam         


        
4条回答
  •  借酒劲吻你
    2021-01-20 17:03

    Given your poor question , we are driven to imagine what is the crucial problem, because I can't believe you had never heard about the string's method, so I think that you have no idea how to use them in this case.

    There's certainly a way to get what you want with string's methods, I have an idea about that, but I prefer to turn directly to the regex tool, thinking that the difficulty is to catch a second part after a colon having newlines in it

    import re
    
    regx = re.compile ('(^[^:]+):((?:[^:]+\r?\n)*[^:]+)$',re.MULTILINE)
    
    coloned = '''id: 123414
    name: Peter
    message: bla bla
    bla bla
    the end: of the text'''
    
    print regx.findall(coloned)
    

    gives

    [('id', ' 123414'), ('name', ' Peter'), ('message', ' bla bla\nbla bla'), ('the end', ' of the text')]
    

    .

    EDIT

    So there was no difficulty in this "problem"

    import re
    
    regx = re.compile ('^([^:\n]+): *(.*?) *$',re.MULTILINE)
    
    ch = ('RT/3.8.8 200 Ok\n'                                    '\n'
          'id: ticket/46863\n'      'Queue: customer-test\n'
          'Owner: Nobo:dy\n'        'Creator: young.park\n'
          'Subject: testing\n'      'Status: new\n'
          'Priority: 0\n'           'InitialPriority: 0\n'
          'FinalPriority: 0\n'      'Requestors: superuser@meme.com\n'
          'Cc:\nAdminCc:\n'         'Created: Mon Apr 25 15:50:27 2011\n'
          'Starts: Not set\n'       'Started: Not set\n'
          'Due: Not set\n'          'Resolved: Not set\n'
          'Told: Not set\n'         'LastUpdated: Mon Apr 25 15:50:28 2011\n'
          'TimeEstimated: 0\n'      'TimeWorked: 0\n'
          'TimeLeft: 0\n'           'CF.{Severity}: \n'           '\n')
    
    print dict(regx.findall(ch))
    print
    
    s = 'id: 1234\nname: Peter\nmessage: foo bar zot\nmsg2: tee:hee\n'
    print dict(regx.findall(s))
    

    result

    {'Due': 'Not set', 'Priority': '0', 'id': 'ticket/46863', 'Told': 'Not set', 'Status': 'new', 'Started': 'Not set', 'Requestors': 'superuser@meme.com', 'FinalPriority': '0', 'Resolved': 'Not set', 'Created': 'Mon Apr 25 15:50:27 2011', 'AdminCc': '', 'Starts': 'Not set', 'Queue': 'customer-test', 'TimeWorked': '0', 'TimeLeft': '0', 'Creator': 'young.park', 'Cc': '', 'LastUpdated': 'Mon Apr 25 15:50:28 2011', 'CF.{Severity}': '', 'Owner': 'Nobo:dy', 'TimeEstimated': '0', 'InitialPriority': '0', 'Subject': 'testing'}
    
    {'message': 'foo bar zot', 'msg2': 'tee:hee', 'id': '1234', 'name': 'Peter'}
    

    .

    John Machin, I didn't mucked about this new regex, it took me one minute to rewrite, and it wouldn't have taken a lot more time at first if we wouldn't have to beg for the essential basic information needed to answer

    Three remarks:

    • if the input ever changes and a supplementary empty line appear anywhere among the others, your solution will crash, while my regex solution will continue to work well. Your solution needs to be completed with if ':' in line

    • I compared the execution times:

      my regex sol 0.000152533352703 seconds , yours 0.000225727012791 ( + 48 % )

    With if ':' in line added, it is slightly longer : 0.000246958761519 seconds ( + 62 % )

    Speed isn't important here, but in other applications, it is good to know that regexes are very fast (100 times faster than lxml, and 1000 faster than BeautifulSoup)

    • you are a specialist of CSV format. A solution with StringIO and csv module 's functions could also be possible

提交回复
热议问题