Say I have the following trivial C header file:
// foo1.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
<
Perhaps the less elegant solution, but staying with the idea of a doThings
function that forces the compiler to emit IR because the definitions are used:
The two problems you identify with this approach are that it requires modifying the header, and that it requires a deeper understanding of the types involved in order to generate "uses" to put in the function. Both of these can be overcome relatively simply:
Instead of compiling the header directly, #include
it (or more likely, a preprocessed version of it, or multiple headers) from a .c file that contains all the "uses" code. Straightforward enough:
// foo.c
#include "foo.h"
void doThings(void) {
...
}
You don't need detailed type information to generate specific usages of the names, matching up struct instantiations to parameters and all that complexity as you have in the "uses" code above. You don't actually need to gather the function signatures yourself.
All you need is the list of the names themselves and to keep track of whether they're for a function or for an object type. You can then redefine your "uses" function to look like this:
void * doThings(void) {
typedef void * (*vfun)(void);
typedef union v { void * o; vfun f; } v;
return (v[]) {
(v){ .o = &(bar){0} },
(v){ .f = (vfun)baz },
};
}
This greatly simplifies the necessary "uses" of a name to either casting it to a uniform function type (and taking its pointer rather than calling it), or wrapping it in &(
and ){0}
(instantiating it regardless of what it is). This means you don't need to store actual type information at all, only the kind of context from which you extracted the name in the header.
(obviously give the dummy function and the placeholder types extended unique names so they don't clash with the code you actually want to keep)
This simplifies the parsing step tremendously since you only have to recognise the context of a struct/union or function declaration, without actually needing to do very much with the surrounding information.
A simple but hackish starting point (which I would probably use because I have low standards :D ) might be:
#include
directives that take an angle-bracketed argument (i.e. an installed header you don't want to also generate declarations for).clang -E -I local-dummy-includes/ -D"__attribute__(...)=" foo.h > temp/foo_pp.h
or something similar)struct
or union
followed by a name, }
followed by a name, or name (
, and use this ridiculously simplified non-parse to build the list of uses in the dummy function, and emit the code for the .c file.It won't catch every possibility; but with a bit of tweaking and extension, it probably will actually deal with a large subset of realistic header code. You could replace this with a dedicated simplified parser (one built to only look at the patterns of the contexts you need) at a later stage.