Is there any practical algorithm for generic elements (unlike counting sort or bucket sort) that runs faster than O(n log n)?
For how many elements? Even though it's something like N1.2, a Shell-Metzner sort is often faster than most others up to a few thousand elements (or so).
It also depends on what you mean by "generic" and "practical". A radix sort can beat O(n log n), and it works for a fairly wide variety of data (but definitely not everything).
If your idea of practical and generic limits the algorithm to one that directly compares elements, then no -- nothing does (or ever can) be better than O(n log n). That's been proven for quite some time.