how to get the function declaration or definitions using regex

后端 未结 7 979
野的像风
野的像风 2020-12-16 06:43

I want to get only function prototypes like

int my_func(char, int, float)
void my_func1(void)
my_func2()

from C files using regex and pytho

相关标签:
7条回答
  • 2020-12-16 07:04

    I think this one should do the work:

    r"^\s*[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*$"
    

    which will be expanded into:

    string begin:   
            ^
    any number of whitespaces (including none):
            \s*
    return type:
      - start with letter or _:
            [\w_]
      - continue with any letter, digit or _:
            [\w\d_]*
    any number of whitespaces:
            \s*
    any number of any characters 
      (for allow pointers, arrays and so on,
      could be replaced with more detailed checking):
            .*
    any number of whitespaces:
            \s*
    function name:
      - start with letter or _:
            [\w_]
      - continue with any letter, digit or _:
            [\w\d_]*
    any number of whitespaces:
            \s*
    open arguments list:
            \(
    arguments (allow none):
            .*
    close arguments list:
            \)
    any number of whitespaces:
            \s*
    string end:
            $
    

    It's not totally correct for matching all possible combinations, but should work in more cases. If you want it to be more accurate, just let me know.

    EDIT: Disclaimer - I'm quite new to both Python and Regex, so please be indulgent ;)

    0 讨论(0)
  • 2020-12-16 07:07

    This is a convenient script I wrote for such tasks but it wont give the function types. It's only for function names and the argument list.

    # Exctract routine signatures from a C++ module
    import re
    
    def loadtxt(filename):
        "Load text file into a string. I let FILE exceptions to pass."
        f = open(filename)
        txt = ''.join(f.readlines())
        f.close()
        return txt
    
    # regex group1, name group2, arguments group3
    rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))"
    code = loadtxt('your file name here')
    cppwords = ['if', 'while', 'do', 'for', 'switch']
    procs = [(i.group(2), i.group(3)) for i in re.finditer(rproc, code) \
     if i.group(2) not in cppwords]
    
    for i in procs: print i[0] + '(' + i[1] + ')'
    
    0 讨论(0)
  • 2020-12-16 07:09

    The regular expression below consider also the definition of destructor or const functions:

    ^\s*\~{0,1}[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*(const){0,1}$
    
    0 讨论(0)
  • 2020-12-16 07:14

    I think regex isn't best solution in your case. There are many traps like comments, text in string etc., but if your function prototypes share common style:

    type fun_name(args);
    

    then \w+ \w+\(.*\); should work in most cases:

    mn> egrep "\w+ \w+\(.*\);" *.h
    md5.h:extern bool md5_hash(const void *buff, size_t len, char *hexsum);
    md5file.h:int check_md5files(const char *filewithsums, const char *filemd5sum);
    
    0 讨论(0)
  • 2020-12-16 07:14

    There are LOTS of pitfalls trying to "parse" C code (or extract some information at least) with just regular expressions, I will definitely borrow a C for your favourite parser generator (say Bison or whatever alternative there is for Python, there are C grammar as examples everywhere) and add the actions in the corresponding rules.

    Also, do not forget to run the C preprocessor on the file before parsing.

    0 讨论(0)
  • 2020-12-16 07:19

    I built on Nick Dandoulakis's answer for a similar use case. I wanted to find the definition of the socket function in glibc. This finds a bunch of functions with "socket" in the name but socket was not found, highlighting what many others have said: there are probably better ways to extract this information, like tools provided by compilers.

    # find_functions.py
    #
    # Extract routine signatures from a C++ module
    import re
    import sys
    
    def loadtxt(filename):
        # Load text file into a string. Ignore FILE exceptions.
        f = open(filename)
        txt = ''.join(f.readlines())
        f.close()
        return txt
    
    # regex group1, name group2, arguments group3
    rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))"
    file = sys.argv[1]
    code = loadtxt(file)
    
    cppwords = ['if', 'while', 'do', 'for', 'switch']
    procs = [(i.group(1)) for i in re.finditer(rproc, code) \
     if i.group(2) not in cppwords]
    
    for i in procs: print file + ": " + i
    

    Then

    $ cd glibc
    $ find . -name "*.c" -print0 | xargs -0 -n 1 python find_functions.py | grep ':.*socket'
    ./hurd/hurdsock.c: _hurd_socket_server (int domain, int dead)
    ./manual/examples/mkfsock.c: make_named_socket (const char *filename)
    ./manual/examples/mkisock.c: make_socket (uint16_t port)
    ./nscd/connections.c: close_sockets (void)
    ./nscd/nscd.c: nscd_open_socket (void)
    ./nscd/nscd_helper.c: wait_on_socket (int sock, long int usectmo)
    ./nscd/nscd_helper.c: open_socket (request_type type, const char *key, size_t keylen)
    ./nscd/nscd_helper.c: __nscd_open_socket (const char *key, size_t keylen, request_type type,
    ./socket/socket.c: __socket (int domain, int type, int protocol)
    ./socket/socketpair.c: socketpair (int domain, int type, int protocol, int fds[2])
    ./sunrpc/key_call.c: key_call_socket (u_long proc, xdrproc_t xdr_arg, char *arg,
    ./sunrpc/pm_getport.c: __get_socket (struct sockaddr_in *saddr)
    ./sysdeps/mach/hurd/socket.c: __socket (int domain, int type, int protocol)
    ./sysdeps/mach/hurd/socketpair.c: __socketpair (int domain, int type, int protocol, int fds[2])
    ./sysdeps/unix/sysv/linux/socket.c: __socket (int fd, int type, int domain)
    ./sysdeps/unix/sysv/linux/socketpair.c: __socketpair (int domain, int type, int protocol, int sv[2])
    

    In my case, this and this might help me, except it seems like I will need to read assembly code to reuse the strategy described there.

    0 讨论(0)
提交回复
热议问题