i have a string which is a return value of REST API (http://requesttracker.wikia.com/wiki/REST) and is using colon seperated key/value pairs.
id: 123414
nam
You really need to say which REST api and provide a documentation reference.
Superficially, it doesn't look too hard:
# Look Ma, no imports!
>>> s = 'id: 1234\nname: Peter\nmessage: foo bar zot\nmsg2: tee:hee\n'
>>> dict(map(str.strip, line.split(':', 1)) for line in s.splitlines())
{'message': 'foo bar zot', 'msg2': 'tee:hee', 'id': '1234', 'name': 'Peter'}
But: (1) the documentation should point you at a parser (2) nothing is ever as easy as it seems from one simple example (see tee:hee
above); if you decide on rolling your own, you should break the above one-liner up into multiple steps so that you can do some error checking (e.g. line.split() returns exactly 2 pieces).
Update after api reference was given:
At first glance, the website gives an enormous number of examples without actually stating what the format is. I suggest that you give it more than a glance; if that fails, ask the author/maintainer.
Update 2 after actual example input given, and after comment "I just tried this and got crashed":
The code supplied was in response to the first (ambiguous) example input, in which all lines except the last contained a colon. It was accompanied by a suggestion that it should be done in pieces instead of a one-liner with especial mention of checking the result of split(':', 1)
. What code did you use? What exactly does "got crashed" mean? Have you tried to work out for yourself what your problem was, and fix it?
What data did you feed it? Your long-awaited actual sample has colon-separated key:value lines preceded by a heading line and an empty line and followed by an empty line. These can be blissfully ignored by a trivial adjustment to the one-liner:
>>> print dict(map(str.strip, line.split(':', 1)) for line in s.splitlines()[2:-1])
{'Status': 'new', 'Resolved': 'Not set', 'CF.{Severity}': '',
'TimeLeft': '0', 'Creator': 'young.park', 'Cc': '', 'Starts': 'Not set',
'Created': 'Mon Apr 25 15:50:27 2011', 'Due': 'Not set',
'LastUpdated': 'Mon Apr 25 15:50:28 2011', 'Started': 'Not set',
'Priority': '0', 'Requestors': 'superuser@meme.com',
'AdminCc': '', 'Owner': 'Nobody', 'Told': 'Not set',
'TimeEstimated': '0', 'InitialPriority': '0', 'FinalPriority': '0',
'TimeWorked': '0', 'Subject': 'testing'}
>>>
Note 1: above output edited manually to avoid horizontal scrolling.
Note 2: Includes the Created and LastUpdated entries (-:whose values contain colons:-)
If you don't believe in blissfully ignoring things, you can do the splitlines first, and assert that the first line contains something like the expected heading, and that the second and last lines are empty.
Examples look like customized http messages (but they are not; it would be too simple); you could use rfc822.Message to parse them:
import rfc822
from cStringIO import StringIO
# skip status line; read headers
m = rfc822.Message(StringIO(raw_text[raw_text.index('\n\n')+2:]))
Now you have access to individual headers:
>>> m.getheader('queue')
'customer-test'
>>> m.getrawheader('queue')
' customer-test\n'
>>> m.getheader('created')
'Mon Apr 25 15:50:27 2011'
>>> m.getdate('created')
(2011, 4, 25, 15, 50, 27, 0, 1, 0)
All headers:
>>> from pprint import pprint
>>> pprint(dict(m.items()))
{'admincc': '',
'cc': '',
'cf.{severity}': '',
'created': 'Mon Apr 25 15:50:27 2011',
'creator': 'young.park',
'due': 'Not set',
'finalpriority': '0',
'id': 'ticket/46863',
'initialpriority': '0',
'lastupdated': 'Mon Apr 25 15:50:28 2011',
'owner': 'Nobody',
'priority': '0',
'queue': 'customer-test',
'requestors': 'superuser@meme.com',
'resolved': 'Not set',
'started': 'Not set',
'starts': 'Not set',
'status': 'new',
'subject': 'testing',
'timeestimated': '0',
'timeleft': '0',
'timeworked': '0',
'told': 'Not set'}
Given your poor question , we are driven to imagine what is the crucial problem, because I can't believe you had never heard about the string's method, so I think that you have no idea how to use them in this case.
There's certainly a way to get what you want with string's methods, I have an idea about that, but I prefer to turn directly to the regex tool, thinking that the difficulty is to catch a second part after a colon having newlines in it
import re
regx = re.compile ('(^[^:]+):((?:[^:]+\r?\n)*[^:]+)$',re.MULTILINE)
coloned = '''id: 123414
name: Peter
message: bla bla
bla bla
the end: of the text'''
print regx.findall(coloned)
gives
[('id', ' 123414'), ('name', ' Peter'), ('message', ' bla bla\nbla bla'), ('the end', ' of the text')]
.
So there was no difficulty in this "problem"
import re
regx = re.compile ('^([^:\n]+): *(.*?) *$',re.MULTILINE)
ch = ('RT/3.8.8 200 Ok\n' '\n'
'id: ticket/46863\n' 'Queue: customer-test\n'
'Owner: Nobo:dy\n' 'Creator: young.park\n'
'Subject: testing\n' 'Status: new\n'
'Priority: 0\n' 'InitialPriority: 0\n'
'FinalPriority: 0\n' 'Requestors: superuser@meme.com\n'
'Cc:\nAdminCc:\n' 'Created: Mon Apr 25 15:50:27 2011\n'
'Starts: Not set\n' 'Started: Not set\n'
'Due: Not set\n' 'Resolved: Not set\n'
'Told: Not set\n' 'LastUpdated: Mon Apr 25 15:50:28 2011\n'
'TimeEstimated: 0\n' 'TimeWorked: 0\n'
'TimeLeft: 0\n' 'CF.{Severity}: \n' '\n')
print dict(regx.findall(ch))
print
s = 'id: 1234\nname: Peter\nmessage: foo bar zot\nmsg2: tee:hee\n'
print dict(regx.findall(s))
result
{'Due': 'Not set', 'Priority': '0', 'id': 'ticket/46863', 'Told': 'Not set', 'Status': 'new', 'Started': 'Not set', 'Requestors': 'superuser@meme.com', 'FinalPriority': '0', 'Resolved': 'Not set', 'Created': 'Mon Apr 25 15:50:27 2011', 'AdminCc': '', 'Starts': 'Not set', 'Queue': 'customer-test', 'TimeWorked': '0', 'TimeLeft': '0', 'Creator': 'young.park', 'Cc': '', 'LastUpdated': 'Mon Apr 25 15:50:28 2011', 'CF.{Severity}': '', 'Owner': 'Nobo:dy', 'TimeEstimated': '0', 'InitialPriority': '0', 'Subject': 'testing'}
{'message': 'foo bar zot', 'msg2': 'tee:hee', 'id': '1234', 'name': 'Peter'}
.
John Machin, I didn't mucked about this new regex, it took me one minute to rewrite, and it wouldn't have taken a lot more time at first if we wouldn't have to beg for the essential basic information needed to answer
Three remarks:
if the input ever changes and a supplementary empty line appear anywhere among the others, your solution will crash, while my regex solution will continue to work well. Your solution needs to be completed with if ':' in line
I compared the execution times:
my regex sol 0.000152533352703 seconds , yours 0.000225727012791 ( + 48 % )
With if ':' in line
added, it is slightly longer : 0.000246958761519 seconds ( + 62 % )
Speed isn't important here, but in other applications, it is good to know that regexes are very fast (100 times faster than lxml, and 1000 faster than BeautifulSoup)
That looks like YAML. Have you tried PyYAML?
>>> import yaml
>>> s = """id: 123414
... name: Peter
... message: bla bla
... bla bla"""
>>> yaml.load(s)
{'message': 'bla bla bla bla', 'id': 123414, 'name': 'Peter'}