Is there any practical algorithm for generic elements (unlike counting sort or bucket sort) that runs faster than O(n log n)?
No. This is one of the few rigorous minimum bounds for algorithms we have. For a collection of n elements, there are n! different orders, so to specify a given order we need log(n!) bits. By Stirling's approximation this is approximately n log n. For each comparison we do between elements, we get essentially one bit of information (ignoring the possibility of equal elements).