python: Generating integer partitions

半腔热情 提交于 2019-11-30 07:14:08

If you are going to use this function repeatedly for the same inputs, it could still be worth caching the return values (if you are going to use it across separate runs, you could store the results in a file).

If you can't find a significantly faster algorithm, then it should be possible to speed this up by an order of magnitude or two by moving the code into a C extension (this is probably easiest using cython), or alternatively by using PyPy instead of CPython (PyPy has its downsides - it does not yet support Python 3, or some commonly-used libraries like numpy and scipy).

The reason for this is, since python is dynamically typed, the interpreter is probably spending most of its time checking the types of the variables - for all the interpreter knows, one of the operations could turn x into a string, in which case expressions like x + y would suddenly have very different meanings. Cython gets around this problem by allowing you to statically declare the variables as integers, while PyPy has a just-in-time compiler which minimises redundant type checks.

To generate compositions directly you can use the following algorithm:

def ruleGen(n, m, sigma):
    """
    Generates all interpart restricted compositions of n with first part
    >= m using restriction function sigma. See Kelleher 2006, 'Encoding
    partitions as ascending compositions' chapters 3 and 4 for details.
    """
    a = [0 for i in range(n + 1)]
    k = 1
    a[0] = m - 1
    a[1] = n - m + 1
    while k != 0:
        x = a[k - 1] + 1
        y = a[k] - 1
        k -= 1
        while sigma(x) <= y:
            a[k] = x
            x = sigma(x)
            y -= x
            k += 1
        a[k] = x + y
        yield a[:k + 1]

This algorithm is very general, and can generate partitions and compositions of many different types. For your case, use

ruleGen(n, 1, lambda x: 1)

to generate all unrestricted compositions. The third argument is known as the restriction function, and describes the type of composition/partition that you require. The method is efficient, as the amount of effort required to generate each composition is constant, when you average over all the compositions generated. If you would like to make it slightly faster in python then it's easy to replace the function sigma with 1.

It's worth noting here as well that for any constant amortised time algorithm, what you actually do with the generated objects will almost certainly dominate the cost of generating them. For example, if you store all the partitions in a list, then the time spent managing the memory for this big list will be far greater than the time spent generating the partitions.

Say, for some reason, you want to take the product of each partition. If you take a naive approach to this, then the processing involved is linear in the number of parts, whereas the cost of generation is constant. It's quite difficult to think of an application of a combinatorial generation algorithm in which the processing doesn't dominate the cost of generation. So, in practice, there'll be no measurable difference between using the simpler and more general ruleGen with sigma(x) = x and the specialised accelAsc.

Testing with n=75 I get:

PyPy 1.8:

w:\>c:\pypy-1.8\pypy.exe pstst.py
1.04800009727 secs.

CPython 2.6:

w:\>python pstst.py
5.86199998856 secs.

Cython + mingw + gcc 4.6.2:

w:\pstst> python -c "import pstst;pstst.run()"
4.06399989128

I saw no difference with Psyco(?)

The run function:

def run():
    import time
    start = time.time()
    for p in accelAsc(75):
        pass
    print time.time() - start, 'secs.'

If I change the definition of accelAsc for Cython to start with:

def accelAsc(int n):
    cdef int x, y, k
    # no more changes..

I get the Cython time down to 2.27 secs.

I'd say that your performance issue is somewhere else.

I didn't compare it with other approaches, but it does seem efficient to me:

import time

start = time.time()
partitions = list(accelAsc(40))
print('time: {:.5f} sec'.format(time.time() - start))
print('length:', len(partitions))

Gave:

time: 0.03636 sec
length: 37338
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!