Goal
How to encode the data that describes how to re-order a static list from a one order to another order using the minimum amount of data possible?
A quick fix might be to use a Zobrist hash to spot cases where you go back to a prior order. That is, after each swap, calculate a hash based on the permutation you reach. Each hash maps to the shortest sequence of swaps found so far for that particular permutation.
This can easily be extended with a bit of exploratory searching - the Zobrist hash was invented as a way to optimise game tree searches.
It's easy to give a strict lower bound to the number of swaps, of course - the number of items that are not in their required locations. Whether that lower bound is actually achievable, though, is a more difficult problem.
Another possible solution, ignoring your data structure...
Send a set of IDs/indexes for items that have changed (if it's a completely random sparse subset, just list them) and a permutation number describing the re-ordering of that subset. The permutation number will need a big integer representation - size should be proportional to log(n!) where n is the number of items changed.
The permutation number is defined from a permutation array, of course, but this detail can be avoided when decoding. The trick is to encode the permutation number so that, once you have swapped the correct first item into the first slot, you can also derive a new permutation number which is correct for the tail of the array.
That is...
while not empty(indexes)
item-to-swap := permutation-no remainder len(indexes)
permutation-no := permutation-no div len(indexes)
if item-to-swap != 0 : swap slot[indexes[0]], slot[indexes[item-to-swap]]
indexes := tail(indexes)
The != 0 check is needed even though all items needed changing at the start - an item might have been swapped upwards into it's correct location earlier in the loop.
This doesn't attempt to optimise the number of swaps - an item may be swapped upwards several times before being swapped downwards into it's correct location. That said, the permutation number is probably an optimum-space representation for a random permutation of an array. Given that your permutation only affects a small subset of the full array, using the smaller permutation number for that subset makes a lot of sense.
As Peter says, it would be ideal to minimise the size of each integer — but in fact, you can do it without putting restrictions on the number of items. Variable-byte encoding is a way of compressing sequences of integers by only using the necessary number of bytes. The most common way of doing this is to reserve one bit in each byte to indicate whether that byte is the last one in the current list item.
It could be useful to use delta encoding first. That's where you store the differences between the integers, rather than the integers themselves — meaning that they end up compressing better with variable-byte. Of course, the integers being stored (perhaps the IDs of items being changed, in your case) would have to be sorted first, but that doesn't seem like it'd be a problem for you.
Assuming that:
Your best solution is probably:
Rather than keeping a list of all the swaps you do as they are performed, compare your starting and finishing data at the end of the day, and then generate the swaps you would need to make that change. This would ignore any locations in the list that remain unchanged, even if they are only unchanged because a series of swaps "undid" some change. If you have your data take the form of a,b,a,b,...
where a
tells you the index of the next elements to leave in the same order they're in, and b
tells you the index of the item to swap it with.
Because you're only doing swaps instead of shifts, you should very rarely end up with data like your sample data where 30, 40, and 50 are in the same order but in a slightly different location. Since the number of swaps will be between 1/4 and 1/10 the number of original items in the list, you'll usually have a big chunk of your data in both the same order and the same location it was in originally. Let's assume the following swaps were made:
1 <-> 9
4 <-> 2
5 <-> 2
The resulting list would be:
1. 90
2. 50
3. 30
4. 20
5. 40
6. 60
7. 70
8. 80
9. 10
So the change data could be represented as:
1,9,2,4,4,5
That's only six values, which could be represented as 16-bit numbers (assuming you won't have over 16,000 items in your initial list). So each "effective" swap could be represented with a single 32-bit number. And since the number of actual swaps will generally be 1/5 to 1/2 the size of the original list, you'll end up sending between 10% and 20% of the data in your original list over the wire (or less since the number of "effective" swaps may be even less if some of those swaps undo one another).
Algo part:
A reordering of a list is called permutation. Each permutation can be split into a set of loops, with each loop of N elements requiring (N - 1) swaps. For example
1, 2, 3, 4, 5, 6 --> 3, 2, 4, 1, 6, 5
This can be split into 1 - 4 - 3 (requires 2 swaps) 2 - 2 (0 swaps) 5 - 6 (1 swap)
To find a solution you can just pick any element at a wrong position and put it on its place.
Details part:
Of course, you can use smaller data types, RLE or some other encoding algorithms and so on.
Very theoretical but non-practical part.
All permutations of a sequence of N numbers can be lexicographically ordered, and one number from 0 to (N! - 1) is enough to represent the sequence. So, theoretically best answer is: compute the index of the permutation, transfer it, recreate the permutation by that index.
I am not sure that analyzing the swaps gets you anything; as you say they can undo each other, and lead to confusing results.
I believe that your best option is to identify, in the re-ordered list, the segments of that list that are not-reordered with respect to the original list, even if they start in a new location. In your example, this is the segment from 30 to 60. So in a sort of run length encoding, I would send back a segment map that describes locations and lengths.
Again, using your example data : a list of ordered start index, length :
{ (9, 1) , (3, 4) , (1, 1) , (8, 1) , (7, 1) , (2, 1) }
seems like the smallest amount of info you can send back. The compressability of the data depends on the number and size of segments held in common.
(Edit) Actually, it occurs to me that there are going to be some data sets where a swap list will be shorter, if the number of swaps is small. But there will probably be some cutover point where run length encoding does better; in that case I would say compute both and pick the smaller one.