问题
I'm looking for an equivalent to sscanf()
in Python. I want to parse /proc/net/*
files, in C I could do something like this:
int matches = sscanf(
buffer,
"%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n",
local_addr, &local_port, rem_addr, &rem_port, &inode);
I thought at first to use str.split
, however it doesn't split on the given characters, but the sep
string as a whole:
>>> lines = open("/proc/net/dev").readlines()
>>> for l in lines[2:]:
>>> cols = l.split(string.whitespace + ":")
>>> print len(cols)
1
Which should be returning 17, as explained above.
Is there a Python equivalent to sscanf
(not RE), or a string splitting function in the standard library that splits on any of a range of characters that I'm not aware of?
回答1:
Python doesn't have an sscanf
equivalent built-in, and most of the time it actually makes a whole lot more sense to parse the input by working with the string directly, using regexps, or using a parsing tool.
Probably mostly useful for translating C, people have implemented sscanf
, such as in this module: http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/
In this particular case if you just want to split the data based on multiple split characters, re.split
is really the right tool.
回答2:
There is also the parse module.
parse()
is designed to be the opposite of format()
(the newer string formatting function in Python 2.6 and higher).
>>> from parse import parse
>>> parse('{} fish', '1')
>>> parse('{} fish', '1 fish')
<Result ('1',) {}>
>>> parse('{} fish', '2 fish')
<Result ('2',) {}>
>>> parse('{} fish', 'red fish')
<Result ('red',) {}>
>>> parse('{} fish', 'blue fish')
<Result ('blue',) {}>
回答3:
When I'm in a C mood, I usually use zip and list comprehensions for scanf-like behavior. Like this:
input = '1 3.0 false hello'
(a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),input.split())]
print (a, b, c, d)
Note that for more complex format strings, you do need to use regular expressions:
import re
input = '1:3.0 false,hello'
(a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),re.search('^(\d+):([\d.]+) (\w+),(\w+)$',input).groups())]
print (a, b, c, d)
Note also that you need conversion functions for all types you want to convert. For example, above I used something like:
strtobool = lambda s: {'true': True, 'false': False}[s]
回答4:
You can split on a range of characters using the re
module.
>>> import re
>>> r = re.compile('[ \t\n\r:]+')
>>> r.split("abc:def ghi")
['abc', 'def', 'ghi']
回答5:
You can parse with module re
using named groups. It won't parse the substrings to their actual datatypes (e.g. int
) but it's very convenient when parsing strings.
Given this sample line from /proc/net/tcp
:
line=" 0: 00000000:0203 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 335 1 c1674320 300 0 0 0"
An example mimicking your sscanf example with the variable could be:
import re
hex_digit_pattern = r"[\dA-Fa-f]"
pat = r"\d+: " + \
r"(?P<local_addr>HEX+):(?P<local_port>HEX+) " + \
r"(?P<rem_addr>HEX+):(?P<rem_port>HEX+) " + \
r"HEX+ HEX+:HEX+ HEX+:HEX+ HEX+ +\d+ +\d+ " + \
r"(?P<inode>\d+)"
pat = pat.replace("HEX", hex_digit_pattern)
values = re.search(pat, line).groupdict()
import pprint; pprint values
# prints:
# {'inode': '335',
# 'local_addr': '00000000',
# 'local_port': '0203',
# 'rem_addr': '00000000',
# 'rem_port': '0000'}
回答6:
There is an ActiveState recipe which implements a basic scanf http://code.activestate.com/recipes/502213-simple-scanf-implementation/
回答7:
Update: The Python documentation for its regex module, re
, includes a section on simulating scanf, which I found more useful than any of the answers above.
https://docs.python.org/2/library/re.html#simulating-scanf
回答8:
you can turn the ":" to space, and do the split.eg
>>> f=open("/proc/net/dev")
>>> for line in f:
... line=line.replace(":"," ").split()
... print len(line)
no regex needed (for this case)
回答9:
Upvoted orip's answer. I think it is sound advice to use re module. The Kodos application is helpful when approaching a complex regexp task with Python.
http://kodos.sourceforge.net/home.html
回答10:
If the separators are ':', you can split on ':', and then use x.strip() on the strings to get rid of any leading or trailing whitespace. int() will ignore the spaces.
回答11:
There is a Python 2 implementation by odiak.
来源:https://stackoverflow.com/questions/2175080/sscanf-in-python