String-interning at compiletime for profiling

旧时模样 提交于 2019-12-30 11:51:21

问题


Context

I am working on an instrumenting profiler, that enables you to name different measurements by string. So for example:

MEASURE_SCOPE(text_rendering_code);
...
MEASURE_SCOPE(password_hashing);
...
MEASURE_START(system_call);
...
MEASURE_STOP(system_call);

where the macros would be defined like this:

#define MEASURE_START(name) save_start_event(get_timestamp(), #name);
#define MEASURE_STOP(name) save_stop_event(get_timestamp(), #name);
#define MEASURE_SCOPE(name) Profiling_Class object##name (#name);

class Profiling_Class{
    string name;
    Profiling_Class(string name){
        this->name = name; 
        save_start_event(get_timestamp(), name);
    }
    ~Profiling_Class(){save_end_event(get_timestamp(), this->name);}
}

save_start_event and save_end_event would just put the timestamp along with the name into some global buffer for later use (exporting measurements and such).

The problem is this: saving the name of a measurement along with the measurement itself is very inefficient. There also needs to happen a lot of work to pair MEASURE_START and MEASURE_STOP, because checking if their names are same requires a string-compare. A much better solution would be to intern the string, i.e. have some array somewhere that holds all the strings:

std::vector<string> = {"text_rendering_code", "password_hashing", "system_call"};

and substitue the string in the measurement-macros with the index of the string in the array:

MEASURE_SCOPE(0);
...
MEASURE_SCOPE(1);
...
MEASURE_START(2);
...
MEASURE_STOP(2);

This way requires less storage, and checking if names match becomes a simple integer-compare. On the other hand, it's very unfriendly towards the user, since he has to know in advance the index of the name he wants to give his measurement.

Question

Is there a way to preserve the nice usage of MEASURE_SCOPE(text_rendering_code) and substitute this with the more efficient MEASURE_SCOPE(0) automatically? This would require building the name-array at compile time, effectively interning the strings. Is this possible?


回答1:


Identical literal strings are not guaranty to be identical, but you can build type from it which can compare identical (without comparing string), something like:

// Sequence of char
template <char...Cs> struct char_sequence
{
    template <char C> using push_back = char_sequence<Cs..., C>;
};

// Remove all chars from char_sequence from '\0'
template <typename, char...> struct strip_sequence;

template <char...Cs>
struct strip_sequence<char_sequence<>, Cs...>
{
    using type = char_sequence<Cs...>;
};

template <char...Cs, char...Cs2>
struct strip_sequence<char_sequence<'\0', Cs...>, Cs2...>
{
    using type = char_sequence<Cs2...>;
};

template <char...Cs, char C, char...Cs2>
struct strip_sequence<char_sequence<C, Cs...>, Cs2...>
{
    using type = typename strip_sequence<char_sequence<Cs...>, Cs2..., C>::type;
};

// struct to create a aligned char array
template <typename chars> struct static_string;

template <char...Cs>
struct static_string<char_sequence<Cs...>>
{
    static constexpr char str[sizeof...(Cs)] = {Cs...};
};

template <char...Cs>
constexpr 
char static_string<char_sequence<Cs...>>::str[sizeof...(Cs)];

// helper to get the i_th character (`\0` for out of bound)
template <std::size_t I, std::size_t N>
constexpr char at(const char (&a)[N]) { return I < N ? a[I] : '\0'; }

// helper to check if the c-string will not be truncated
template <std::size_t max_size, std::size_t N>
constexpr bool check_size(const char (&)[N])
{
    static_assert(N <= max_size, "string too long");
    return N <= max_size;
}

// Helper macros to build char_sequence from c-string
#define PUSH_BACK_8(S, I) \
    ::push_back<at<(I) + 0>(S)>::push_back<at<(I) + 1>(S)> \
    ::push_back<at<(I) + 2>(S)>::push_back<at<(I) + 3>(S)> \
    ::push_back<at<(I) + 4>(S)>::push_back<at<(I) + 5>(S)> \
    ::push_back<at<(I) + 6>(S)>::push_back<at<(I) + 7>(S)>

#define PUSH_BACK_32(S, I) \
        PUSH_BACK_8(S, (I) + 0) PUSH_BACK_8(S, (I) + 8) \
        PUSH_BACK_8(S, (I) + 16) PUSH_BACK_8(S, (I) + 24)

#define PUSH_BACK_128(S, I) \
    PUSH_BACK_32(S, (I) + 0) PUSH_BACK_32(S, (I) + 32) \
    PUSH_BACK_32(S, (I) + 64) PUSH_BACK_32(S, (I) + 96)

// Macro to create char_sequence from c-string (limited to 128 chars)
#define MAKE_CHAR_SEQUENCE(S) \
    strip_sequence<char_sequence<> \
    PUSH_BACK_128(S, 0) \
    >::type::template push_back<check_size<128>(S) ? '\0' : '\0'>

// Macro to return an static c-string
#define MAKE_STRING(S) \
    aligned_string<MAKE_CHAR_SEQUENCE(S)>::str

So

MEASURE_SCOPE(MAKE_STRING("text_rendering_code"));

would still return same pointer than you can compare directly.

You can modify your Macro MEASURE_SCOPE to include directly MAKE_STRING.

gcc has an extension to simplify MAKE_STRING:

template <typename CHAR, CHAR... cs>
const char* operator ""_c() { return static_string<cs...>{}::str; }

and then

MEASURE_SCOPE("text_rendering_code"_c);



回答2:


I can only guess what you mean, because you don't give enough details, and they matter a lot.

A possible approach is to generate some ad-hoc C or C++ code with your own generator. Remember that some C or C++ code of your project can be generated (this is a crude form or metaprogramming; Qt moc, RPCGEN, bison, SWIG are typical examples of C++ or C generators, but you can easily make your own one, see here; perhaps with the help of some scripting language like Python, Guile, AWK, ..., or even in C++), and your build automation could handle that (e.g. some as-hoc rule or recipe in your Makefile).

Then you could write a very simple generating program collecting all occurrences of MEASURE_SCOPE and MEASURE_START, MEASURE_STOP macro invocations in your code (*.cpp files of your project). This is quite simple to code: you could read line by line all .cpp files and look for MEASURE_SCOPE (etc...) followed by spaces then by ( in them.

That generating program -dealing with your interned strings- might emit a large header file measure-generated.h with e.g. things like

// in generated header
#define MEASURE_POINT_system_call 1
#define MEASURE_POINT_password_hashing 2

(maybe you want to generate some large enum instead)

and it would also emit a measure-generated-array.cpp file like

// generated code
const char* measure_array[] = {
  NULL,
  "system_call",
  "password_hashing",
  /// etc....
  NULL,
};

And then you could in some of your headers

#define MEASURE_SCOPE(X) measure_array[MEASURE_POINT_##X]

etc, using preprocessor tricks like stringizing and/or concatenation

See also this.

This would require building the name-array at compile time, effectively interning the strings. Is this possible?

Yes, of course. Do that in your own C++ generator which knows all your project *.cpp files like I suggested. You can generate C++ files at build time.



来源:https://stackoverflow.com/questions/50288847/string-interning-at-compiletime-for-profiling

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!