I am implementing several datastructures and one primitive I want to use is the following: I have a memory chunk A[N] (it has a variable length, but I take 100 for my examples)
OK, if it's like memmove
but with a circular buffer, here's the way to do it:
Case 1: source/dest do not overlap. Just use memcpy
, possibly breaking it up as needed where the buffer wraps.
Case 2: source/dest are equal. Do nothing.
Case 3: start of source lies strictly inside the dest region. Do a simple forward copy loop, for (i=0; i<k; i++) A[(dest+i)%N] = A[(src+i)%N];
Case 4: start of dest lies strictly inside the source region. Do a simple backward copy loop, for (i=K; i; i--) A[(dest+i-1)%N] = A[(src+i-1)%N];
Edit: This answer only works when K is at most N/2; otherwise it's possible that source and dest both start inside each other. I don't have an immediate fix, but it may be possible to choose a starting offset and direction that fix the issue...
Here's an O(n2) algorithm is pretty straightforward - just rotate the entire buffer a single byte, and then repeat that as many times as steps you want to:
void rotateBuffer(char *buffer, int size, int steps)
{
char tmp;
int i;
for (i = 0; i < steps; i++)
{
tmp = buffer[size - 1];
memmove(buffer + 1, buffer, size - 1);
buffer[0] = tmp;
}
}
It won't be fast, but its get the job done, and with only constant temporary storage.
Edit:
If you need to rotate just a sub-part of the buffer relative to a static underlying 'background', as discussed below in the comments, you can do something like this:
void rotateBuffer(int count, int start, int length)
{
int i;
int j;
int index;
// rotate 'count' bytes
for (i = 0; i < count; i++)
{
// rotate by a single byte
for (j = length - 1; j >= 0; j--)
{
index = start + i + j;
buf[(index + 1) % SIZE] = buf[index % SIZE];
}
}
}
I think it might have a problem if you need to rotate the entire buffer, but in that case you could just fall back to the code above.
** this only works if the length of C is <= half the length of A. But I'm leaving it up here in hopes of fixing it.**
** this solution will not preserve any of the contents of the target range, a behavior which I believe matches the wording of the original question **
;; A function that wraps an out-of-bounds index to its proper location.
mod'(i):
return (i + length(A)) mod length(A)
;; shifts the range A[i]..A[i + n] to A[i - delta]..A[i - delta + n]
move_backward (i,delta,n):
A[mod'(i - delta)] = A[mod'(i)]
if (n > 0):
move_backward (i + 1, delta, n - 1)
;; shifts the range A[i - n]..A[i] to A[i - n + delta]..A[i + delta]
move_forward (i, delta, n):
A[mod'(i + delta)] = A[mod'(i)]
if (n > 0):
move_forward (i - 1, delta, n - 1)
shift_range (source_first, source_last, target_first):
n = mod'(source_last - source_first)
delta = mod'(target_first - source_first)
if (delta > length(A) / 2):
move_backward (source_first, length(A) - delta, n)
else
move_forward (source_last, delta, n)
This solution is O(N) and uses already processed source locations as scratch space to use when ranges overlap. It will swap contents of source and destination up to the point when it reaches the start of destination, then it will proceed copying from the scratch space generated before. The second loop restores the clobbered region after each character of the scratch space is used.
move(A,N, src_idx, dst_idx, len)
{
first_dst_idx=dst_idx;
first_src_idx=src_idx;
mlen=0;
while(src_idx != first_dst_idx && len > 0)
{
temp = A[dst_idx];
A[dst_idx] = A[src_idx];
A[src_idx] = temp;
src_idx=(src_idx+1) mod N;
dst_idx=(dst_idx+1) mod N;
len--; mlen++;
}
src_idx = first_src_idx;
while(len > 0)
{
A[dst_idx] = A[src_idx];
A[src_idx] = A[first_dst_idx];
src_idx=(src_idx+1) mod N;
dst_idx=(dst_idx+1) mod N;
first_dst_idx=(first_dst_idx+1) mod N;
len--;
}
while(mlen > 0)
{ // restore reamining scratch space
A[src_idx] = A[first_dst_idx];
src_idx=(src_idx+1) mod N;
first_dst_idx=(first_dst_idx+1) mod N;
mlen--;
}
}
Detailed explanation of this case is given in the first answer by R.. I've nothing to add here.
The easiest approach would be always rotate whole array. This also moves some unneeded elements from destination range, but since in this case K > N/2
, this does not make number of operations more then twice as necessary.
To rotate the array, use cycle leader algorithm: take first element of the array (A[0]) and copy it to destination position; previous contents of this position move again to its proper position; continue until some element is moved to the starting position.
Continue applying the cycle leader algorithm for next starting positions: A[1], A[2], ..., A[GCD(N,d) - 1], where d
is the distance between source and destination.
After GCD(N,d)
steps, all elements are on their proper positions. This works because:
GCD(N,d)
).N / GCD(N,d)
- because d / GCD(N,d)
and N / GCD(N,d)
are relatively prime.This algorithm is simple and it moves each element exactly once. It may be made thread-safe (if we skip the write step unless inside the destination range). Other multi-threading-related advantage is that each element may have only two values - value before "move" and value after "move" (no temporary in-between values possible).
But it does not always have optimal performance. If element_size * GCD(N,d)
is comparable to cache line size, we might take all GCD(N,d)
starting positions and process them together. If this value is too large, we can split starting positions into several contiguous segments to lower space requirements back to O(1).
The problem is when element_size * GCD(N,d)
is much smaller than cache line size. In this case we get a lot of cache misses and performance degrades. gusbro's idea to temporarily swap array pieces with some "swap" region (of size d
) suggests more efficient algorithm for this case. It may be optimized more if we use "swap" region, that fits in the cache, and copy non-overlapped areas with memcpy.
One more algorithm. It does not overwrite elements that are not in the destination range. And it is cache-friendly. The only disadvantage is: it moves each element exactly twice.
The idea is to move two pointers in opposite directions and swap pointed elements. There is no problem with overlapping regions because overlapping regions are just reversed. After first pass of this algorithm, we have all source elements moved to destination range, but in reversed order. So second pass should reverse destination range:
for (d = dst_start, s = src_end - 1;
d != dst_end;
d = (d + 1) % N, s = (s + N - 1) % N)
swap(s, d);
for (d = dst_start, s = dst_end - 1;
d != dst_end;
d = (d + 1) % N, s = (s + N - 1) % N)
swap(s, d);
This is not a complete answer yet, but I think it may be the right idea.
Start with an element of the source range and consider the destination position it will be mapped to. That position is either inside the source range, or outside it. If it's outside the source range, you can just copy, and you're done with that element. On the other hand, if it maps onto a destination position inside the source range, you can copy it, but you have to save the old value you're overwriting and perform the above process iteratively with this new element of the source.
Essentially, you're operating on the cycles of a permutation.
The problem is keeping track of what you've finished and what remains to be done. It's not immediately apparent if there's a way to do this without O(n) working space.