I am looking for advice regarding high performance multi-dimensional array libraries/classes for C++. What I really need is:
the ability to dynamically allocate
With the caveat that this is shameless self-promotion,
https://github.com/ndarray/ndarray
may be worth looking into.
While it doesn't provide optimized mathematical operators, it does provide an interface to Eigen for that. Where it really stands out is in providing interoperability with Python/NumPy through SWIG or Boost.Python.
Maybe library such as BLAS, a CBLAS exists, but don't remember where.
http://www.netlib.org/blas/
Necomi seems to provide the features you would like.
It includes support for an arbitrary multi-dimensional numbers whose dimensions can be fixed at runtime, provides fast access to single elements, while also supporting arithmetic (among others) expressions.
There is a broad and relatively recent survey, including benchmarks, here.
I believe that you can speed up Boost.UBlas by binding it to underlying numerical libraries like LAPACK or Intel MKL, but have not done that.
fwiw, the implementations that seem to come up most often as candidates are Boost.UBlas and MTL. It's my experience that wide adoption is more likely to foster ongoing support and development.
Eigen is extremely well-maintained (right now, at least, there are new versions coming out every month) and supports the other operations you need.