问题
I am working with large number of integer permutations. The number of elements in each permutation is K. The element size is 1 byte. I need to generate N unique random permutations.
Constraints: K <= 144, N <= 1,000,000.
I came up with the following straightforward algorithm:
- Generate list of N random permutations. Store all permutations in RAM.
- Sort the list and delete all duplicates (if any). The number of duplicates will be relatively small.
- If there were any duplicates, add random permutations to the list until there are N permutations and return to step 2.
Is there a better way to do this? Especially, is there a way to not store all permutations in RAM (write them on disk while generating)?
Edit: In the end, the generated permutations need to be accessed sequentially (one-by-one, no need for random access). The RAM is more crucial factor (I would prefer to not store all permutations at once in RAM).
回答1:
One possible solution is using bloom filters.
Store your permutations on disk (write them sequentially) and maintain a bloom filter in RAM.
Once you generate a permutation - check if it exists in the bloom filter, if the bloom filter says it is not written to disk yet- write it, bloom filters don't have false negatives.
If the bloom filter however says it is on the disk - it might be wrong..
if the bloom filter said "the permutation already exists", you can decide if you want to quit this candidate and go to the next one without checking if it is really already in the set, or you can search the disk to see if it is really there.
If you chose the later, you should consider maintaining a smart DS for the permutations such as a hash table or a B+ tree.
Bloom Filters are perfect match in here - they are designed to represent a set that is expansive to read, while giving 0 false negatives, which is the most important thing here.
回答2:
I'm a little late, but I think I have a method not shown.
I remembered that their is an algorithm that given the starting order of all K items and an integer index it will generate the index'th permutation of the K items in roughly time proportional to K. Knowing that their K! (factorial) permutations of K items, as long as you can randomly generate an integer between zero and K! you can use the routine to generate N unique random indices in memory then print out the corresponding permutation to disk.
Here is a Python version of the algorithm with N set to 10 and k to 25, although I have used k = 144 successfully:
from math import factorial
from copy import copy
import random
def perm_at_index(items, index):
'''
>>> for i in range(10):
print i, perm_at_index([1,2,3], i)
0 [1, 2, 3]
1 [1, 3, 2]
2 [2, 1, 3]
3 [2, 3, 1]
4 [3, 1, 2]
5 [3, 2, 1]
6 [1, 2, 3]
7 [1, 3, 2]
8 [2, 1, 3]
9 [2, 3, 1]
'''
itms, perm = items[:], []
itmspop, lenitms, permappend = itms.pop, len(itms), perm.append
thisfact = factorial(lenitms)
thisindex = index % thisfact
while itms:
thisfact /= lenitms
thischoice, thisindex = divmod(thisindex, thisfact)
permappend(itmspop(thischoice))
lenitms -= 1
return perm
if __name__ == '__main__':
N = 10 # Change to 1 million
k = 25 # Change to 144
K = ['K%03i' % j for j in range(k)] # ['K000', 'K001', 'K002', 'K003', ...]
maxperm = factorial(k) # You need arbitrary length integers for this!
indices = set(random.randint(0, maxperm) for r in range(N))
while len(indices) < N:
indices |= set(random.randint(0, maxperm) for r in range(N - len(indices)))
for index in indices:
print (' '.join(perm_at_index(K, index)))
The output of which looks something like this:
K008 K016 K024 K014 K003 K007 K015 K018 K009 K006 K021 K012 K017 K013 K022 K020 K005 K000 K010 K001 K011 K002 K019 K004 K023
K006 K001 K023 K008 K004 K017 K015 K009 K021 K020 K013 K000 K012 K014 K016 K002 K022 K007 K005 K018 K010 K019 K011 K003 K024
K004 K017 K008 K002 K009 K020 K001 K019 K018 K013 K000 K005 K023 K014 K021 K015 K010 K012 K016 K003 K024 K022 K011 K006 K007
K023 K013 K016 K022 K014 K024 K011 K019 K001 K004 K010 K017 K018 K002 K000 K008 K006 K009 K003 K021 K005 K020 K012 K015 K007
K007 K001 K013 K003 K023 K022 K016 K017 K014 K018 K020 K015 K006 K004 K011 K009 K000 K012 K002 K024 K008 K021 K005 K010 K019
K002 K023 K004 K005 K024 K001 K006 K007 K014 K021 K015 K012 K022 K013 K020 K011 K008 K003 K017 K016 K019 K010 K009 K000 K018
K001 K004 K007 K024 K011 K022 K017 K023 K002 K003 K006 K021 K010 K014 K013 K020 K012 K016 K019 K000 K015 K008 K018 K009 K005
K009 K003 K010 K008 K020 K024 K007 K018 K023 K013 K001 K019 K006 K002 K016 K000 K004 K017 K014 K011 K022 K021 K012 K005 K015
K006 K009 K018 K010 K015 K016 K011 K008 K001 K013 K003 K004 K002 K005 K022 K020 K021 K017 K000 K019 K024 K012 K023 K014 K007
K017 K006 K010 K015 K018 K004 K000 K022 K024 K020 K014 K001 K023 K016 K005 K011 K002 K007 K009 K013 K019 K012 K021 K003 K008
回答3:
Heres one way of doing it.
1) generate first N permutations and store them on the disk.
2) then run a randomize algorithm on the permutations.
You can optimize using Divide and Conquer , by picking only a first X elements from the disk and then randomizing it, and the next X elements in the next iteration, and so on... and then merge the results.
You probably don't need the disk here.
回答4:
Given that 10! ~= 3e6
i.e. for K > ~15
if you shuffle a list of K items a million times using the proper Fischer-Yates or Knuth shuffle then you are very likely to get a unique shuffle every time.
If you can save all one million unique permutations in memory in a set data-structure then you can shuffle a list of K items and add them to the set until you have a million of them.
Here's some Python that also shows a measure of how good the shuffle is at generating unique perms for varying K's:
>>> from math import factorial
>>> from random import shuffle
>>>
>>> n = 1000000
>>> for k in range(16, 9, -1):
perms = set()
perm = list(range(k))
trials = 0
while len(perms) < n:
trials += 1
for i in range(n - len(perms)):
shuffle(perm)
perms.add(tuple(perm))
print('N=%i, K=%i, trials=%i, K!//N= %i' % (n, k, trials, factorial(k)//n))
N=1000000, K=16, trials=1, K!//N= 20922789
N=1000000, K=15, trials=1, K!//N= 1307674
N=1000000, K=14, trials=2, K!//N= 87178
N=1000000, K=13, trials=2, K!//N= 6227
N=1000000, K=12, trials=3, K!//N= 479
N=1000000, K=11, trials=5, K!//N= 39
N=1000000, K=10, trials=11, K!//N= 3
>>>
来源:https://stackoverflow.com/questions/12884428/generate-sample-of-1-000-000-random-permutations