Portability
TBB is portable. It supports Intel and AMD (i.e. x86) processors, IBM PowerPC and POWER processors, ARM processors, and possibly others. If you look in the build directory, you can see all the configurations the build system support, which include a wide range of operating systems (Linux, Windows, Android, MacOS, iOS, FreeBSD, AIX, etc.) and compilers (GCC, Intel, Clang/LLVM, IBM XL, etc.). I have not tried TBB with the PGI C++ compiler and know that it does not work with the Cray C++ compiler (as of 2017).
A few years ago, I was part of the effort to port TBB to IBM Blue Gene systems. Static linking was a challenge, but is now addressed by the big_iron.inc build system helper. The other issues were supporting relatively ancient versions of GCC (4.1 and 4.4) and ensuring the PowerPC atomics were working. I expect that porting to any currently unsupported architecture would be relatively straightforward on platforms that provide or are compatible with GCC and POSIX.
Usage in community codes
I am aware of at least two HPC application frameworks that uses TBB:
I do not know how MOOSE uses TBB, but MADNESS uses TBB for its task queue and memory allocator.
Performance versus other threading models
I have personally used TBB in the Parallel Research Kernels project, within which I have compared TBB to OpenMP, OpenCL, Kokkos, RAJA, C++17 Parallel STL, and other models. See the C++ subdirectory for details.
The following figure shows the relative performance of the aforementioned models on an Intel Xeon Phi 7250 processor (the details aren't important - all models used the same settings). As you can see, TBB does quite well except for smaller problem sizes, where the overhead of adaptive scheduling is more relevant. TBB has tuning knobs that will affect these results.
Full disclosure: I work for Intel in a research/pathfinding capacity.