i have a string which is a return value of REST API (http://requesttracker.wikia.com/wiki/REST) and is using colon seperated key/value pairs.
id: 123414
nam
Given your poor question , we are driven to imagine what is the crucial problem, because I can't believe you had never heard about the string's method, so I think that you have no idea how to use them in this case.
There's certainly a way to get what you want with string's methods, I have an idea about that, but I prefer to turn directly to the regex tool, thinking that the difficulty is to catch a second part after a colon having newlines in it
import re
regx = re.compile ('(^[^:]+):((?:[^:]+\r?\n)*[^:]+)$',re.MULTILINE)
coloned = '''id: 123414
name: Peter
message: bla bla
bla bla
the end: of the text'''
print regx.findall(coloned)
gives
[('id', ' 123414'), ('name', ' Peter'), ('message', ' bla bla\nbla bla'), ('the end', ' of the text')]
.
So there was no difficulty in this "problem"
import re
regx = re.compile ('^([^:\n]+): *(.*?) *$',re.MULTILINE)
ch = ('RT/3.8.8 200 Ok\n' '\n'
'id: ticket/46863\n' 'Queue: customer-test\n'
'Owner: Nobo:dy\n' 'Creator: young.park\n'
'Subject: testing\n' 'Status: new\n'
'Priority: 0\n' 'InitialPriority: 0\n'
'FinalPriority: 0\n' 'Requestors: superuser@meme.com\n'
'Cc:\nAdminCc:\n' 'Created: Mon Apr 25 15:50:27 2011\n'
'Starts: Not set\n' 'Started: Not set\n'
'Due: Not set\n' 'Resolved: Not set\n'
'Told: Not set\n' 'LastUpdated: Mon Apr 25 15:50:28 2011\n'
'TimeEstimated: 0\n' 'TimeWorked: 0\n'
'TimeLeft: 0\n' 'CF.{Severity}: \n' '\n')
print dict(regx.findall(ch))
print
s = 'id: 1234\nname: Peter\nmessage: foo bar zot\nmsg2: tee:hee\n'
print dict(regx.findall(s))
result
{'Due': 'Not set', 'Priority': '0', 'id': 'ticket/46863', 'Told': 'Not set', 'Status': 'new', 'Started': 'Not set', 'Requestors': 'superuser@meme.com', 'FinalPriority': '0', 'Resolved': 'Not set', 'Created': 'Mon Apr 25 15:50:27 2011', 'AdminCc': '', 'Starts': 'Not set', 'Queue': 'customer-test', 'TimeWorked': '0', 'TimeLeft': '0', 'Creator': 'young.park', 'Cc': '', 'LastUpdated': 'Mon Apr 25 15:50:28 2011', 'CF.{Severity}': '', 'Owner': 'Nobo:dy', 'TimeEstimated': '0', 'InitialPriority': '0', 'Subject': 'testing'}
{'message': 'foo bar zot', 'msg2': 'tee:hee', 'id': '1234', 'name': 'Peter'}
.
John Machin, I didn't mucked about this new regex, it took me one minute to rewrite, and it wouldn't have taken a lot more time at first if we wouldn't have to beg for the essential basic information needed to answer
Three remarks:
if the input ever changes and a supplementary empty line appear anywhere among the others, your solution will crash, while my regex solution will continue to work well. Your solution needs to be completed with if ':' in line
I compared the execution times:
my regex sol 0.000152533352703 seconds , yours 0.000225727012791 ( + 48 % )
With if ':' in line
added, it is slightly longer : 0.000246958761519 seconds ( + 62 % )
Speed isn't important here, but in other applications, it is good to know that regexes are very fast (100 times faster than lxml, and 1000 faster than BeautifulSoup)