Given the algorithm here, look at the scenario where i is at \"X\", the following happens:
You are right, the extra swap operations are not necessary, this algorithm here is best for clarity, but not for performance. See the discussion of Quick Sort (3 Way Partition).
In Quicksort is optimal by Robert Sedgewich himself, he has a different approach that uses much less swap operation, but you can imagine it also needs more code, and is less clear than the algorithm in the demo.
The algorithm is based on Dijkstra's solution to “The Problem of the Dutch National Flag” which appeared (p.111) in his book “A Discipline of Programming” published in 1976. Dijkstra’s goal with the book was to derive provable correct solutions to multiple problems. Not just presenting the final result, but actually going through the design process.
In “The Problem of the Dutch National Flag” Dijkstra envisions a row of buckets (think array) each containing a single pebble having either the color red, white or blue (the colors of the Dutch flag). There is a mini-computer with just two operations. It can swap the contents of two buckets and it can inspect the color of the pebble in a bucket. He restricts the use of the latter operation to a single inspection of each bucket. The latter is sufficient, hence that is what he chooses in his usual elegant minimalistic style. In terms of proving the correctness having as few cases as possible to consider is definitely an advantage. Here is a quote from one of his writings: “… become extremely suspicious as soon as one finds oneself faced with a case analysis which has to distinguish between a great number of cases”.
Transformed to the sorting problem, the equivalent would be to have at most N comparisons. This is actually quite common. In the C++ standard the complexity of the different sorting algorithms and the heap operations are in terms of number of comparisons. Not the number of swaps. The thinking is that swapping is cheap, if not use indirection (pointers) to make it cheap. Comparisons however can be expensive. So, it makes more sense to state the complexity in terms of number of comparisons.
Yes, you could reduce the number of swaps, but not if you want to have at most N comparisons.