How can I statically link the intel\'s TBB libraries to my application? I know all the caveats such as unfair load distribution of the scheduler, but I don\'t need the sched
Using the opensource version:
After running "make tbb",go to the build/linux_xxxxxxxx_release folder.
Then run:
ar -r libtbb.a concurrent_hash_map.o concurrent_queue.o concurrent_vector.o
dynamic_link.o itt_notify.o cache_aligned_allocator.o pipeline.o queuing_mutex.o
queuing_rw_mutex.o reader_writer_lock.o spin_rw_mutex.o spin_mutex.o critical_section.o
task.o tbb_misc.o tbb_misc_ex.o mutex.o recursive_mutex.o condition_variable.o
tbb_thread.o concurrent_monitor.o semaphore.o private_server.o rml_tbb.o
task_group_context.o governor.o market.o arena.o scheduler.o observer_proxy.o
tbb_statistics.o tbb_main.o concurrent_vector_v2.o concurrent_queue_v2.o
spin_rw_mutex_v2.o task_v2.o
And you should get libtbb.a as output.
Note that your program should build both with "-ldl" and libtbb.a
Just link the files, I just did it and works. Here's the SConscript file. There's two minor things, a symbol which has the same name in tbb and tbbmalloc which I had to prevent to be multiply defined, and I prevented the usage of ITT_NOTIFY since it creates another symbol with the same name in both libs.
Import('g_CONFIGURATION')
import os
import SCutils
import utils
tbb_basedir = os.path.join(
g_CONFIGURATION['basedir'],
'3rd-party/tbb40_233oss/')
#print 'TBB base:', tbb_basedir
#print 'CWD: ', os.getcwd()
ccflags = []
cxxflags = [
'-m64',
'-march=native',
'-I{0}'.format(tbb_basedir),
'-I{0}'.format(os.path.join(tbb_basedir, 'src')),
#'-I{0}'.format(os.path.join(tbb_basedir, 'src/tbb')),
'-I{0}'.format(os.path.join(tbb_basedir, 'src/rml/include')),
'-I{0}'.format(os.path.join(tbb_basedir, 'include')),
]
cppdefines = [
# 'DO_ITT_NOTIFY',
'USE_PTHREAD',
'__TBB_BUILD=1',
]
linkflags = []
if g_CONFIGURATION['build'] == 'debug':
ccflags.extend([
'-O0',
'-g',
'-ggdb2',
])
cppdefines.extend([
'TBB_USE_DEBUG',
])
else:
ccflags.extend([
'-O2',
])
tbbenv = Environment(
platform = 'posix',
CCFLAGS=ccflags,
CXXFLAGS=cxxflags,
CPPDEFINES=cppdefines,
LINKFLAGS=linkflags
)
############################################################################
# Build verbosity
if not SCutils.has_option('verbose'):
SCutils.setup_quiet_build(tbbenv, True if SCutils.has_option('colorblind') else False)
############################################################################
tbbmallocenv = tbbenv.Clone()
tbbmallocenv.Append(CCFLAGS=[
'-fno-rtti',
'-fno-exceptions',
'-fno-schedule-insns2',
])
#tbbenv.Command('version_string.tmp', None, '')
# Write version_string.tmp
with open(os.path.join(os.getcwd(), 'version_string.tmp'), 'wb') as fd:
(out, err, ret) = utils.xcall([
'/bin/bash',
os.path.join(g_CONFIGURATION['basedir'], '3rd-party/tbb40_233oss/build/version_info_linux.sh')
])
if ret:
raise SCons.Errors.StopError('version_info_linux.sh execution failed')
fd.write(out);
#print 'put version_string in', os.path.join(os.getcwd(), 'version_string.tmp')
#print out
fd.close()
result = []
def setup_tbb():
print 'CWD: ', os.getcwd()
tbb_sources = SCutils.find_files(os.path.join(tbb_basedir,'src/tbb'), r'^.*\.cpp$')
tbb_sources.extend([
'src/tbbmalloc/frontend.cpp',
'src/tbbmalloc/backref.cpp',
'src/tbbmalloc/tbbmalloc.cpp',
'src/tbbmalloc/large_objects.cpp',
'src/tbbmalloc/backend.cpp',
'src/rml/client/rml_tbb.cpp',
])
print tbb_sources
result.append(tbbenv.StaticLibrary(target='libtbb', source=tbb_sources))
setup_tbb()
Return('result')
After acquiring the source code from https://www.threadingbuildingblocks.org/, build TBB like this:
make extra_inc=big_iron.inc
If you need extra options, then instead build like this:
make extra_inc=big_iron.inc <extra options>
If you run a multiprocessing application, e.g. using MPI, you may need to explicitly initialize the TBB scheduler with the appropriate number of threads to avoid oversubscription.
An example of this in a large application can be found in https://github.com/m-a-d-n-e-s-s/madness/blob/master/src/madness/world/thread.cc.
This feature has been available for many years (since at least 2013), although it is not documented for the reasons described in other answers.
This feature was originally developed because IBM Blue Gene and Cray supercomputers either did not support shared libraries or did not perform well when using them, due to the lack of a locally mounted filesystem.
EDIT - Changed to use extra_inc
. Thanks Jeff!
Build with the following parameter:
make extra_inc=big_iron.inc
The static libraries will be built. See the caveats in build/big_iron.inc
.
This is strongly not recommended:
Is there a version of TBB that provides statically linked libraries?
TBB is not provided as a statically linked library, for the following reasons*:
Most libraries operate locally. For example, an Intel(R) MKL FFT transforms an array. It is irrelevant how many copies of the FFT there are. Multiple copies and versions can coexist without difficulty. But some libraries control program-wide resources, such as memory and processors. For example, garbage collectors control memory allocation across a program. Analogously, TBB controls scheduling of tasks across a program. To do their job effectively, each of these must be a singleton; that is, have a sole instance that can coordinate activities across the entire program. Allowing k instances of the TBB scheduler in a single program would cause there to be k times as many software threads as hardware threads. The program would operate inefficiently, because the machine would be oversubscribed by a factor of k, causing more context switching, cache contention, and memory consumption. Furthermore, TBB's efficient support for nested parallelism would be negated when nested parallelism arose from nested invocations of distinct schedulers.
The most practical solution for creating a program-wide singleton is a dynamic shared library that contains the singleton. Of course if the schedulers could cooperate, we would not need a singleton. But that cooperation requires a centralized agent to communicate through; that is, a singleton!
Our decision to omit a statically linkable version of TBB was strongly influenced by our OpenMP experience. Like TBB, OpenMP also tries to schedule across a program. A static version of the OpenMP run-time was once provided, and it has been a constant source of problems arising from duplicate schedulers. We think it best not to repeat that history. As an indirect proof of the validity of these considerations, we could point to the fact that Microsoft Visual C++ only provides OpenMP support via dynamic libraries.
Source: http://www.threadingbuildingblocks.org/faq/11#sthash.t3BrizFQ.dpuf
Although not officially endorsed by the TBB team, it is possible to build your own statically linked version of TBB with make extra_inc=big_iron.inc
.
I have not tested it on Windows or MacOS, but on Linux, it worked (source):
wget https://github.com/01org/tbb/archive/2017_U6.tar.gz
tar xzfv 2017_U6.tar.gz
cd tbb-2017_U6
make extra_inc=big_iron.inc
The generated files are in tbb-2017_U6/build/linux*release
.
When you link your application to the static TBB version:
-static
switch-ltbb
) and pthread (-lpthread
)In my test, I also needed to explicitely reference all .o
files from the manually build TBB version. Depending on your project, you might also need to pass -pthread
to gcc.
I have created a toy example to document all the steps in this Github repository:
It also contains test code to make sure that the generated binary is portable on other Linux distributions.