I'm not an expert, but it seems that many of the features why people don't like it can be the reasons it performs well. Several years ago, having registers (instead of a stack), register frames, etc. were seen as nice solutions for making the architecture seem simpler to humans. However, nowadays, what matters is cache performance, and x86's variable-length words allow it to store more instructions in cache. The "instruction decode", which I believe opponents pointed out once took up half the chip, is not nearly so much that way anymore.
I think parallelism is one of the most important factors nowadays -- at least for algorithms that already run fast enough to be usable. Expressing high parallelism in software allows the hardware to amortize (or often completely hide) memory latencies. Of course, the farther reaching architecture future is probably in something like quantum computing.
I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware. CUDA's PTX does some fast register use calculations (graph coloring), so nVidia can use a register machine instead of a stack machine, but still have an upgrade path that doesn't break all old software.