I would like to parse the parameter and keyword values from URI/L\'s in a text file. Parameters without values should also be included. Python is fine but am open to suggestion
I would use a regular expression like this (first code then explanation):
pairs = re.findall(r'(\w+)=(.*?)(?:\n|&)', s, re.S)
for k, v in pairs:
print('{0} = {1}'.format(k, v))
The first line is where the action happens. The regular expression finds all occurrences of a word followed by an equal sign and then a string that terminates either by a &
or by a new line char. The return pairs
is a tuple list, where each tuple contains the word (the keyword) and the value. I didn't capture the =
sign, and instead I print it in the loop.
Explaining the regex:
\w+
means one or more word chars. The parenthesis around it means to capture it and return that value as a result.
=
- the equal sign that must follow the word
.*?
- zero or more chars in a non-greedy manner, that is until a new line appears or the &
sign, which is designated by \n|&
. The (?:..
pattern means that the \n
or &
should not be captured.
Since we capture 2 things in the regex - the keyword and everything after the =
sign, a list of 2-tuples is returned.
The re.S
tells the regex engine to allow the match-all regex code - .
- include in the search the new line char as well, that is, allow the search span over multiple lines (which is not default behavior).