Absolutely SIMD is still relevant.
First, SIMD can more easily interoperate with scalar code, because it can read and write the same memory directly, while GPUs require the data to be uploaded to GPU memory before it can be accessed. For example, it's straightforward to vectorize a function like memcmp() via SIMD, but it would be absurd to implement memcmp() by uploading the data to the GPU and running it there. The latency would be crushing.
Second, both SIMD and GPUs are bad at highly branchy code, but SIMD is somewhat less worse. This is due to the fact that GPUs group multiple threads (a "warp") under a single instruction dispatcher. So what happens when threads need to take different paths: an if branch is taken in one thread, and the else branch is taken in another? This is called a "branch divergence" and it is slow: all the "if" threads execute while the "else" threads wait, and then the "else" threads execute while the "if" threads wait. CPU cores, of course, do not have this limitation.
The upshot is that SIMD is better for what might be called "intermediate workloads:" workloads up to intermediate size, with some data-parallelism, some unpredictability in access patterns, some branchiness. GPUs are better for very large workloads that have predictable execution flow and access patterns.
(There's also some peripheral reasons, such as better support for double precision floating point in CPUs.)