Given the algorithm here, look at the scenario where i is at \"X\", the following happens:
You are right, the extra swap operations are not necessary, this algorithm here is best for clarity, but not for performance. See the discussion of Quick Sort (3 Way Partition).
In Quicksort is optimal by Robert Sedgewich himself, he has a different approach that uses much less swap operation, but you can imagine it also needs more code, and is less clear than the algorithm in the demo.