I am looking for an algorithm that can map a number to a unique permutation of a sequence. I have found out about Lehmer codes and the factorial number system thanks to a simila
Assuming the resulting number fits inside a word (e.g. 32 or 64 bit integer) relatively easily, then much of the linked article still applies. Encoding and decoding from a variable base remains the same. What changes is how the base varies.
If you're creating a permutation of a sequence, you pick an item out of your bucket of symbols (from the original sequence) and put it at the start. Then you pick out another item from your bucket of symbols and put it on the end of that. You'll keep picking and placing symbols at the end until you've run out of symbols in your bucket.
What's significant is which item you picked out of the bucket of the remaining symbols each time. The number of remaining symbols is something you don't have to record because you can compute that as you build the permutation -- that's a result of your choices, not the choices themselves.
The strategy here is to record what you chose, and then present an array of what's left to be chosen. Then choose, record which index you chose (packing it via the variable base method), and repeat until there's nothing left to choose. (Just as above when you were building a permuted sequence.)
In the case of duplicate symbols it doesn't matter which one you picked, so you can treat them as the same symbol. The difference is that when you pick a symbol which still has a duplicate left, you didn't reduce the number of symbols in the bucket to pick from next time.
Let's adopt a notation that makes this clear:
Instead of listing duplicate symbols left in our bucket to choose from like c a b c a a
we'll list them along with how many are still in the bucket: c-2 a-3 b-1
.
Note that if you pick c
from the list, the bucket has c-1 a-3 b-1
left in it. That means next time we pick something, we have three choices.
But on the other hand, if I picked b
from the list, the bucket has c-2 a-3
left in it. That means next time we pick something, we only have two choices.
When reconstructing the permuted sequence we just maintain the bucket the same way as when we were computing the permutation number.
The implementation details aren't trivial, but they're straightforward with standard algorithms. The only thing that might heckle you is what to do when a symbol in your bucket is no longer available.
Suppose your bucket was represented by a list of pairs (like above): c-1 a-3 b-1
and you choose c
. Your resulting bucket is c-0 a-3 b-1
. But c-0
is no longer a choice, so your list should only have two entries, not three. You could move the entire list down by 1 resulting in a-3 b-1
, but if your list is long this is expensive. A fast an easy solution: move the last element of the bucket into the removed location and decrease your bucket size: c0 a-3 b-1
becomes b-1 a-3
or just b-1 a-3
.
Note that we can do the above because it doesn't matter what order the symbols in the bucket are listed in, as long as it's the same way when we encode or decode the number.