I\'m somewhere on the learning curve when it comes to regular expressions, and I need to use them to automatically modify function prototypes in a bunch of C headers. Does
Let's say you have the whole c file read into $buffer. * first create regexp that replaces all comments with equally number of spaces and linefeeds so that row and col positions won't change * create regexp that can handle parenthesised string * then regexp like this finds functions: (static|)\s+(\w+)\s*$parenthezized_regexp+*{
this reg exp does not handle functions which function definition uses preprocessor directives.
if you go for lex/yacc you have to combine ansi c and preprocessor grammars to handle those preprocessor directives inside function definitions
To do this properly, you'll need to parse according to the C language grammar. But if this is for the C language only and for header files only, perhaps you can take some shortcuts and get by without full blown BNF.
^
\s*
(unsigned|signed)?
\s+
(void|int|char|short|long|float|double) # return type
\s+
(\w+) # function name
\s*
\(
[^)]* # args - total cop out
\)
\s*
;
This is by no means correct, and needs work. But it could represent a starting point, if you're willing to put in some effort and improve it. It can be broken by function definitions that span lines, function pointer argument, MACROS and probably many other things.
Note that BNF can be converted to a regex. It will be a big, complex regex, but it's doable.
You may implement a parser using ANSI C yacc/lex grammar.
For a one-off exercise, you'd probably do best by starting simple and looking at the code you have to scan. Pick the three worst headers, generate a regex or series of regexes that do the job. You have to decide whether and how you are going deal with comments that contain function declarations (and, indeed, with function declarations that contain comments). Dealing with:
extern void (*function(int, void (*)(int)))(int);
(which could be the Standard C function signal()
) is tough in a regex because of the nested parentheses. If you don't have any such function prototypes, time spent working out how to deal with them is time wasted. Similar comments apply to pointers to multi-dimensional arrays. The chances are that you have stylistic conventions to simplify your life. You may not use C99 (C++) comments; you don't need to code around them. You probably don't put multiple declarations in a single line, either with or without a common type - so you don't have to deal with that.
extern int func1(int), func2(double); double func3(int); // Nasty!
A one liner regex sounds very hard. I personally use a perl script for that. It's kind of easy. The basic aproach is> 1. Call your favorite c preprocessor to eliminate comments and get macros expanded. (so it's easier) 2. Count '{' '}' symbols. For functions in plain C they have a predictable behavior that will allow you to detect function names. 3. Look the function names into the original source (before preprocessing to get the signature that has typedefs) It is an inefficiency approach but it works quite well for me. Step 1 is not really necessary but it will make your life easier
Here's a regular expression that's a good starting point for finding C function names:
^\s*(?:(?:inline|static)\s+){0,2}(?!else|typedef|return)\w+\s+\*?\s*(\w+)\s*\([^0]+\)\s*;?
And these are some test cases to validate the expression:
// good cases
static BCB_T *UsbpBufCtrlRemoveBack (BCB_Q_T *pBufCtrl);
inline static AT91_REG *UDP_EpIER (UDP_ENDPOINT_T *pEndpnt);
int UsbpEnablePort (USBP_CTRL_T *pCtrl)
bool_t IsHostConnected(void)
inline AT91_REG *UDP_EpCSR (UDP_ENDPOINT_T *pEndpnt)
// shouldn't match
typedef void (*pfXferCB)(void *pEndpnt, uint16_t Status);
else if (bIsNulCnt && bIsBusyCnt)
return UsbpDump(Buffer, BufSize, Option);
Finally, here's a simple TCL script to read through a file and extract all the function prototypes and function names.
set fh [open "usbp.c" r]
set contents [read $fh]
close $fh
set fileLines [split $contents \n]
set lineNum 0
set funcCount 0
set funcRegexp {^\s*(?:(?:inline|static)\s+){0,2}(?!else|typedef|return)\w+\s+\*?\s*(\w+)\s*\([^0]+\)\s*;?}
foreach line $fileLines {
incr lineNum
if {[regexp $funcRegexp $line -> funcName]} {
puts "line:$lineNum, $funcName"
incr funcCount
}; #end if
}; #end foreach
puts "$funcCount functions found."