问题
I'm trying to parse messages that I receive from a GSM modem in python.
I have a lot of messages that I need to parse. I receive new messages every couple of hours or so.
Here's an example of the data the I receive after reading data from the modem by using a serial object into a list x.
AT+CMGL="ALL"
+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"
here's message one
+CMGL: 2,"REC READ","+918884100421","","13/04/05,09:40:38+22"
here's message two
+CMGL: 3,"REC READ","+918884100421","","13/04/05,09:41:04+22"
here's message three
+CMGL: 4,"REC READ","+918884100421","","13/04/05,10:04:18+22"
here's message four
+CMGL: 5,"REC READ","+918884100421","","13/04/05,10:04:32+22"
here's message five
.
.
.
.
.
There are a lot more messages, I've just listed five here.
My main intention is to extract the content of the message, for example "here's message one" and so on for every message that I receive.
Here's the code that I'm using right now.
def reading():
print "Reading all the messages stored on SIM card"
phone.write(b'AT+CMGL="ALL"\r')
sleeps()
x=phone.read(10000)
sleeps()
print x
print "Now parsing the message!"
k="".join(x)
parse(k)
k=""
def parse(k):
m = re.search("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\r\n(.+)\r\n",k)
print "6="
print m.group(6)
Phone is the serial object that I'm using to read from the GSM modem.
Here m.group(6) is captures the message content of the first message "here's message one"
How can I get it to match the content of all the messages, not just the first one.
I tried setting the multiline flag but that didn't work. Neither did using re.findall() instead of re.search().
Also the match object returned by re.search isn't iterable.
Please help.
回答1:
Since I don't get your material I just make a sample.
'\xef\xbb\xbfAT+CMGL="ALL"\n\n+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"\nhere\'s message one \n\n+CMGL: 2,"REC READ","+918884100421","","13/04/05,09:40:38+22"\nhere\'s message two\n\n+CMGL: 3,"REC READ","+918884100421","","13/04/05,09:41:04+22"\nhere\'s message three\n\n+CMGL: 4,"REC READ","+918884100421","","13/04/05,10:04:18+22"\nhere\'s message four\n\n+CMGL: 5,"REC READ","+918884100421","","13/04/05,10:04:32+22"\nhere\'s message five\n'
This comes from your question using ''.join()
. And then I use your regex pattern, just replace the \r\n
with \n
because the sample I use using \n
. And I get the result. I don't know why the findall
doesn't work with you.
def parse(x):
res = []
match = re.finditer("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
for each in match:
res.append(each.group(6))
return res
The result I get is ["here's message one ", "here's message two", "here's message three", "here's message four", "here's message five"]
. finditer
returns an iterator and findall
also works OK.
def parse(x):
res = []
match = re.findall("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
for each in match:
res.append(each[5])
return res
回答2:
Using regexp for this is not a very robust solution because it will not handle variations in different phone behaviour. In your example the the format of the response is
+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"
but other phones will give responses like
+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"
Notice the difference for the forth parameter, ""
versus nothing.
Checking out 27.005, the syntax for the response in text mode is
+CMGL: <index>,<stat>,<oa/da>,[<alpha>],[<scts>][,<tooa/toda>,<length>]<CR><LF>
<data><CR><LF>
and <alpha>
is indeed optional. Yes, it is probably possible to write a regexp that takes this into account, but then you sort of wander into two problems land.
What I recommend you to do is to switch to doing proper parsing of the response, that is: start on the very first character and advance in chunks depending on expected parameter format (and presence). See this answer for a quick and dirty way to just exctract the phone number. It is not as robust as the algorithm I describe below (for instance comma + 2
is assuming too much).
The absolute correct algorithm for parsing responses is:
Match the prefix on the start of the line (e.g. +CMGL:
). Then start parsing differentiating the following tokens:
- white-space
' '
or'\t'
- comma
','
- double-quote
'"'
- carriage-return
'\r'
- line-feed
'\n'
- any-non-white-space-non-comma-non-double-quote-non-cr-non-lf-character
For each parameter, start by ignoring any leading white space.
If getting a comma, the parameter was not present, advance to parsing next parameter.
If getting carriage return, the next character should be line feed and the end of line is reached. If getting a non-white-space-non... character this is the start of a numerical parameter. Collect all non-white-space-non... characters following for this parameter. Following this the only legal characters should be zero or more white space followed by either comma or carriage return.
If getting a double quote character advance to the next double quote character, that is the end of the string (this is safe and correct because even if the string should contain a double quote characters, they are escaped but not as \"
). Following this the only legal characters should be zero or more white-space followed by either comma or carriage return.
The above might seem a bit overwhelming at first, but it is really not that complicated when you start dealing with it.
回答3:
If the message is always on newline
(?:[\n\r]+|^)\+CMGL.*?[\n\r]+(.*?)(?=[\n\r]+|$)
Group 1 contains your required message
来源:https://stackoverflow.com/questions/15848770/parsing-message-parameters-received-by-a-gsm-modem-in-python