I've introduced it into our code base because we needed a bettor malloc to use when we moved to a 16 core machine. With 8 and under it wasn't a significant issue. It has worked well for us. We plan on using the fine grained concurrent containers next. Ideally we can make use of the real meat of the product, but that requires rethinking how we build our code. I really like the ideas in TBB, but it's not easy to retrofit onto a code base.
You can't think of TBB as another threading library. They have a whole new model that really sits on top of threads and abstracts the threads away. You learn to think in task, parallel_for type operations and pipelines. If I were to build a new project I would probably try to model it in this fashion.
We work in Visual Studio and it works just fine. It was originally written for linux/pthreads so it runs just fine over there also.