Does Symbol table for C++ code contain function names along with class names?

走远了吗. 提交于 2019-11-30 05:47:53

Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.

We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.

What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.

OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.

Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.

OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below. The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).

Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).

As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way. However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)

As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.

While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).

Update March 2017: Now have symbol tables for C++2014. Update June 2018: Now have symbol tables for C++2017.

A symbol table maps names to constructs within the program. As such it is used to record the names of classes, functions, variables, and anything else that has a user-specified name within the program.

(There are two common kinds of symbol table - one that the compiler maintains when it is compiling your program, and another that exists in object file so that it can be linked to other objects. The two are strongly related, but need not have similar representation internally. Typically only some of the symbols from the compiler's symbol table will be output into the object).

Part of what you say makes no sense:

if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table

How can the compiler determine to what construct a name refers if it cannot look it up in the symbol table?

but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.

There's no reason it could not do this in a single pass.

I could not understand whether it is actually compiler dependent or not?

All compilers are going to use a symbol table, but its use will be hidden inside the implementation.

I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes?

How is what dependent on the passes? All names go in the symbol table - that's what it's for - and usually symbol resolution is important for just about everything else the compiler does, so it needs to be done early (i.e. in the first pass - and in fact the main purpose of the first pass in a multi-pass compiler compiler may well be just to build the symbol table!).

Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?

I'll give it a stab:

class A
{
    int a;
    void f(int, int);
};

Will yield a symbol table containing symbols "A", "a", and "f". Typically "a" and "f" would be marked with a scope to simplify lookup, eg:

"A"  -> (class)
"A::a"  ->  (class variable member)
"A::f(int,int)"  ->  (class function member)

It's also possible that the a and f symbols will not be stored in the top-level symbol table, but rather that each name space (including C++ namespaces and classes) will have its own symbol table, containing the symbols defined inside it. But this is, arguably, just a data structure choice. You can still abstractly view the symbol table as a flat table, where a name maps to a construct.

In general the "A::a" symbol would not be output to the object file, since it is not required for linking.

Short answer: yes, using 'nm --demangle' on linux

Long answer: The functions in the symbol table contain the function name plus the return value and if it is belongs to a class, the class name too. But the names,types (not always) and classes are not written with it's fulls names to use less space. This strings called demangle. But you know that this short name is unique and you can parse the full class name from it. To view the symbol table of your program you can use 'nm' on linux.

http://linux.about.com/library/cmd/blcmdl1_nm.htm

It got the --demangle flag to view the original names. You can compile random short programs to see what comes out.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!