I\'m trying to make \"a script\" - essentially an awk command - to extract the prototypes of functions of C code in a .c file to generate automatically a header .h. I\'m new wit
The regexp you're trying to write would be:
$ awk '/^[[:alpha:]_][[:alnum:]_]*\**[[:space:]]+[[:alpha:]_][[:alnum:]_]*[[:space:]]*\([^)]*\)/' file
dictent_t* dictentcreate(const char * key, const char * val)
dict_t* dictcreate()
void dictdestroy(*dict_t d)
void dictdump(dict_t *d)
int dictlook(dict_t *d, const char * key)
int dictget(char* s, dict_t *d, const char *key)
dict_t* dictadd(dict_t* d, const char * key, const char * val)
dict_t dictup(dict_t d, const char * key, const char *newval)
dict_t* dictrm(dict_t* d, const char * key)
which written without character classes and making assumptions about your locale would be:
$ awk '/^[a-zA-Z_][a-zA-Z0-9_]*\**[ \t]+[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\([^)]*\)/' file
dictent_t* dictentcreate(const char * key, const char * val)
dict_t* dictcreate()
void dictdestroy(*dict_t d)
void dictdump(dict_t *d)
int dictlook(dict_t *d, const char * key)
int dictget(char* s, dict_t *d, const char *key)
dict_t* dictadd(dict_t* d, const char * key, const char * val)
dict_t dictup(dict_t d, const char * key, const char *newval)
dict_t* dictrm(dict_t* d, const char * key)
but:
int foo(int x /* always > 0 (I hope) */)
. When providing sample input/output you should always include some text that you think will be hard for a script to NOT select given it "looks" a lot like the text you do want to select but in the wrong context for your needs.Note that C symbols cannot start with a number and so the regexp to match one is not [[:alnum:]_]+
but is instead [[:alpha:]_][[:alnum:]_]*
. Also functions can and often do return pointers to pointers to pointers and the *
can be next to the function name instead of the function return type so you REALLY should be using a regexp like this (untested since you didn't provide input of the format that this would match) if your function declarations can be any of the normal formats:
awk '/^[[:alpha:]_][[:alnum:]_]*((\*[[:space:]]*)*|(\*[[:space:]]*)*|[[:space:]]+)[[:alpha:]_][[:alnum:]_]*[[:space:]]*\([^)]*\)/' file
That won't of course match declarations that span lines - that is a whole other can of worms.
In general you can't parse C without a C parser but if you want something cheap and cheerful then at least run a C beautifier on the code first to try to get all the various possible layouts into one consistent format (google "C beautifier" and you also need to strip out the comments (see for example https://stackoverflow.com/a/13062682/1745001).
Given your new requirements and your new sample input/output, this is what you are asking for:
$ awk 'match($0,/^[[:alpha:]_][[:alnum:]_]*\**[[:space:]]+[[:alpha:]_][[:alnum:]_]*[[:space:]]*\([^)]*\)/) { print substr($0,RSTART,RLENGTH) ";" }' file
dict_t dictup(dict_t d, const char * key, const char * newval);
dict_t* dictrm(dict_t* d, const char * key);
but again - this is by no means robust given the possible layouts of C code in general. You need a C parser, a C beautifier, and/or a specialized tool to do this job (e.g. googl cscope
) robustly.