Debugging in threading building Blocks

I would like to program in threading building blocks with tasks. But how does one do the debugging in practice?

In general the print method is a solid technique for debugging programs. In my experience with MPI parallelization, the right way to do logging is that each thread print its debugging information in its own file (say "debug_irank" with irank the rank in the MPI_COMM_WORLD) so that the logical errors can be found.

How can something similar be achieved with TBB? It is not clear how to access the thread number in the thread pool as this is obviously something internal to tbb.

Alternatively, one could add an additional index specifying the rank when a task is generated but this makes the code rather complicated since the whole program has to take care of that.

First, get the program working with 1 thread. To do this, construct a task_scheduler_init as the first thing in main, like this:

#include "tbb/tbb.h"

int main() {
    tbb::task_scheduler_init init(1);
    ...
}

Be sure to compile with the macro TBB_USE_DEBUG set to 1 so that TBB's checking will be enabled.

If the single-threaded version works, but the multi-threaded version does not, consider using Intel Inspector to spot race conditions. Be sure to compile with TBB_USE_THREADING_TOOLS so that Inspector gets enough information.

Otherwise, I usually first start by adding assertions, because the machine can check assertions much faster than I can read log messages. If I am really puzzled about why an assertion is failing, I use printfs and task ids (not thread ids). Easiest way to create a task id is to allocate one by post-incrementing a tbb::atomic<size_t> and storing the result in the task.

If I'm having a really bad day and the printfs are changing program behavior so that the error does not show up, I use "delayed printfs". Stuff the printf arguments in a circular buffer, and run printf on the records later after the failure is detected. Typically for the buffer, I use an array of structs containing the format string and a few word-size values, and make the array size a power of two. Then an atomic increment and mask suffices to allocate slots. E.g., something like this:

const size_t bufSize = 1024;

struct record {
    const char* format;
    void *arg0, *arg1;
};

tbb::atomic<size_t> head;

record buf[bufSize];

void recf(const char* fmt, void* a, void* b) {
    record* r = &buf[head++ & bufSize-1];
    r->format = fmt;
    r->arg0 = a;
    r->arg1 = b;
}

void recf(const char* fmt, int a, int b) {
    record* r = &buf[head++ & bufSize-1];
    r->format = fmt;
    r->arg0 = (void*)a;
    r->arg1 = (void*)b;
}

The two recf routines record the format and the values. The casting is somewhat abusive, but on most architectures you can print the record correctly in practice with printf(r->format, r->arg0, r->arg1) even if the the 2nd overload of recf created the record.
~ ~

来源：https://stackoverflow.com/questions/32887113/debugging-in-threading-building-blocks

标签

multithreading

debugging

mpi

tbb