I\'m looking for a way to extract all preprocessor symbols used in my code.
As an example, if my code looks like this:
#ifdef FOO
#endif
#if ( BAR == 1 &
You can get half way there by using a preprocessor library such as Boost.Wave. It can act as a lexer so you wouldn't have to write that part yourself. You would have to supply a grammar for the bit you cared about (define, ifdef, ifndef, if, elif) though.
That's quite simple. You have just to parse the source code exactly the way a conformant pre-processor would, and with the correct C or C++ version support. Ok, I'm joking, if you support only the later version, your code is likely to produce correct results on older versions - but even this should be thoroughly controlled.
More seriously now. As you can ask the pre-processor to give you the list of all defined symbols, you can simply tokenize the source, and identify all tokens from that list that are not immediately following an initial #define or #undef. This part should be reasonably feasable with lex+yacc.
The only alternative I can imagine would be to use the code of a real compiler (Clang should be easier than gcc but unsure) discard all code generation and consistently store every macro usage.
TL/DR: however you take it, it will be a hard work: if you can do without, keep away from that...