I am implementing several datastructures and one primitive I want to use is the following: I have a memory chunk A[N] (it has a variable length, but I take 100 for my examples)
This is not a complete answer yet, but I think it may be the right idea.
Start with an element of the source range and consider the destination position it will be mapped to. That position is either inside the source range, or outside it. If it's outside the source range, you can just copy, and you're done with that element. On the other hand, if it maps onto a destination position inside the source range, you can copy it, but you have to save the old value you're overwriting and perform the above process iteratively with this new element of the source.
Essentially, you're operating on the cycles of a permutation.
The problem is keeping track of what you've finished and what remains to be done. It's not immediately apparent if there's a way to do this without O(n) working space.