How do I remove strings from / obfuscate a compiled binary? The goal is to avoid having people read the names of the functions/methods inside.
It is a dynamic library (.
These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so
will show these entries.
strip -g
will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.
There are two main ways you can inform the compiler which functions are private.
Mark the private functions static
. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.
Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden
compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.
If you have a function:
int foo(int a, int b);
then the syntax for marking it hidden is:
int foo(int a, int b) __attribute__((visibility("hidden")));
and the syntax for marking it visible is:
int foo(int a, int b) __attribute__((visibility("default")));
For further details, see this document, which is an excellent source of information on this subject.
There are some commercial obfuscators which accomplish this. Basically, they re-write all of the symbols on the go. Something like this:
void foo()
becomes
void EEhj_y33() // usually much, much longer and clobbered
Variable names are also given the same treatment, as are members of structures / unions (depending on what level of obfuscation you set).
Most of them work by scanning your code base, establishing a dictionary then substituting garbled messes for symbol names in the output, which can then be compiled as usual.
I don't recommend using them, but they are available. Simply obfuscating meaningful symbol names is not going to stop someone who is determined to discover how your library / program works. Additionally, you aren't going to be able to do anything about someone who traces system calls. Really, what's the point? Some argue that it helps keep the 'casual observer' at bay, I argue that someone running ltrace
strace
and strings
is typically anything but casual.
Unless you mean string literals , not symbols ? There's nothing you can do about them, unless you store the literals in an encrypted format that you code has to decrypt before using. That is not just a waste, but an egregious waste that provides no benefit whatsoever.
They are unavoidable. Those strings are the means by which the loader links shared libraries at runtime.
Assuming you are correctly specifying a hidden visibility to g++ for all of your source files (as other posters have recommended), there's a chance you might be running in to this GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38643
Try dumping the symbols in your binary that are showing up (readelf -Wa mylib.so | c++filt | less
); if you see only vtable and VTT symbols after demangling, then the gcc bug might be your problem.
Edit: if you can, try GCC 4.4.0 or later, as it appears to be fixed there.