I am looking for advice regarding high performance multi-dimensional array libraries/classes for C++. What I really need is:
the ability to dynamically allocate
There is a broad and relatively recent survey, including benchmarks, here.
I believe that you can speed up Boost.UBlas by binding it to underlying numerical libraries like LAPACK or Intel MKL, but have not done that.
fwiw, the implementations that seem to come up most often as candidates are Boost.UBlas and MTL. It's my experience that wide adoption is more likely to foster ongoing support and development.