Is there any practical algorithm for generic elements (unlike counting sort or bucket sort) that runs faster than O(n log n)?
Many people have mentioned the information-theoretic Ω(n lg n) bound on comparison sorting algorithms, which can't be broken in comparison sorts. (This earlier question explores why that's the case.)
However, there are some types of comparison sorts that, while not breaking O(n lg n) in the average case, can be shown to run faster on inputs that are already presorted to some extent. For example, Dijkstra's smoothsort runs in O(n) on already-sorted inputs with O(n lg n) worst-case behavior. One of my favorite sorts, Cartesian tree sort, provably takes optimal advantage of presortedness in a few metrics. For example, it can sort any sequence with a constant number of increasing or decreasing subsequences in time O(n), degrading gracefully to O(n lg n) in the worst case.
On the subject of non-comparison sorts, there are some famous but tricky sorting algorithms for integers that surpass O(n lg n) bynp doing clever bit-manipulation tricks. The best known integer sorting algorithm is a randomized algorithm that can sort in O(n √lg lg n), while the fastest deterministic algorithm for integer sorting runs in O(n lg lg n) time. You may have heard that radix sort works in O(n), though technically it's O(n lg U), where U is the largest value in the array to sort.
In short, no, you can't do much better than O(n lg n), but you can do marginally better if you know something about your input.