I often have to write code in other languages that interact with C structs. Most typically this involves writing Python code with the struct or ctypes modules.
So I
Regular expressions would work great about 90% of the time and then cause endless headaches for the remaining 10%.
The headaches happen in the cases where the C code contains syntax that you didn't think of when writing your regular expressions. Then you go back and realise that C can't really be parsed by regular expressions, and life becomes not fun.
Try turning it around: define your own simple format, which allows less tricks than C does, and generate both the C header file and the Python interface code from your file:
define socketopts
int16 port
int32 ipv4address
int32 flags
Then you can easily write some Python to convert this to:
typedef struct {
short port;
int ipv4address;
int flags;
} socketopts;
and also to emit a Python class which uses struct
to pack/unpack three values (possibly two of them big-endian and the other native-endian, up to you).
One my friend for this tasks done C-parser which he use with cog.
If you compile your C code with debugging (-g
), pahole (git) can give you the exact structure layouts being used.
$ pahole /bin/dd … struct option { const char * name; /* 0 8 */ int has_arg; /* 8 4 */ /* XXX 4 bytes hole, try to pack */ int * flag; /* 16 8 */ int val; /* 24 4 */ /* size: 32, cachelines: 1, members: 4 */ /* sum members: 24, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 32 bytes */ }; …
This should be quite a lot nicer to parse than straight C.
Have you looked at Swig?
I have quite successfully used GCCXML on fairly large projects. You get an XML representation of the C code (including structures) which you can post-process with some simple Python.
ctypes-codegen or ctypeslib (same thing, I think) will generate ctypes Structure
definitions (also other things, I believe, but I only tried structs) by parsing header files using GCCXML. It's no longer supported, but will likely work in some cases.