Factoring polys in sympy

回眸只為那壹抹淺笑 提交于 2019-12-01 04:05:00

Putting together some of the methods happens to give a nice answer this time. It would be interesting to see if this strategy works more often than not on the equations you generate or if, as the name implies, this is just a lucky result this time.

def iflfactor(eq):
    """Return the "I'm feeling lucky" factored form of eq."""
    e = Mul(*[horner(e) if e.is_Add else e for e in
        Mul.make_args(factor_terms(expand(eq)))])
    r, e = cse(e)
    s = [ri[0] for ri in r]
    e = Mul(*[collect(ei.expand(), s) if ei.is_Add else ei for ei in
        Mul.make_args(e[0])]).subs(r)
    return e

>>> iflfactor(eq)  # using your equation as eq
2*x*y*z*(x**2 + x*y + y**2 + (z - 3)*(x + y + z) + 3)
>>> _.count_ops()
15

BTW, a difference between factor_terms and gcd_terms is that factor_terms will work harder to pull out common terms while retaining the original structure of the expression, very much like you would do by hand (i.e. looking for common terms in Adds that can be pulled out).

>>> factor_terms(x/(z+z*y)+x/z)
x*(1 + 1/(y + 1))/z
>>> gcd_terms(x/(z+z*y)+x/z)
x*(y*z + 2*z)/(z*(y*z + z))

For what it's worth,

Chris

asmeurer

As far as I know, there is no function that does exactly that. I believe it is actually a very hard problem. See Reduce the number of operations on a simple expression for some discussion on it.

There are however, quite a few simplification functions in SymPy that you can try. One that you haven't mentioned that gives a different result is gcd_terms, which factorizes out a symbolic gcd without doing an expansions. It gives

>>> gcd_terms(expression)
x*y*z*((-x + 1)*(-x - y + 1) + (-x + 1)*(-x - z + 1) + (-y + 1)*(-x - y + 1) + (-y + 1)*(-y - z + 1) + (-z + 1)*(-x - z + 1) + (-z + 1)*(-y - z + 1))

Another useful function is .count_ops, which counts the number of operations in an expression. For example

>>> expression.count_ops()
47
>>> factor(expression).count_ops()
22
>>> e = x * y * z * (6 * (1 - x - y - z) + (x + y) ** 2 + (y + z) ** 2 + (x + z) ** 2)
>>> e.count_ops()
18

(note that e.count_ops() is not the same as you counted yourself, because SymPy automatically distributes the 6*(1 - x - y - z) to 6 - 6*x - 6*y - 6*z).

Other useful functions:

  • cse: Performs a common subexpression elimination on the expression. Sometimes you can simplify the individual parts and then put it back together. This also helps in general to avoid duplicate computations.

  • horner: Applies the Horner scheme to a polynomial. This minimizes the number of operations if the polynomial is in one variable.

  • factor_terms: Similar to gcd_terms. I'm actually not entirely clear what the difference is.

Note that by default, simplify will try several simplifications, and return the one that is minimized by count_ops.

I have had a similar problem, and ended up implementing my own solution before I stumbled across this one. Mine seems to do a much better job reducing the number of operations. However, mine also does a brute-force style set of collections over all combinations of variables. Thus, it's runtime grows super-exponentially in the number of variables. OTOH, I've managed to run it on equations with 7 variables in a not-unreasonable (but far from real-time) amount of time.

It is possible that there are some ways to prune some of the search branches here, but I haven't bothered with it. Further optimizations are welcome.

def collect_best(expr, measure=sympy.count_ops):
    # This method performs sympy.collect over all permutations of the free variables, and returns the best collection
    best = expr
    best_score = measure(expr)
    perms = itertools.permutations(expr.free_symbols)
    permlen = np.math.factorial(len(expr.free_symbols))
    print(permlen)
    for i, perm in enumerate(perms):
        if (permlen > 1000) and not (i%int(permlen/100)):
            print(i)
        collected = sympy.collect(expr, perm)
        if measure(collected) < best_score:
            best_score = measure(collected)
            best = collected
    return best

def product(args):
    arg = next(args)
    try:
        return arg*product(args)
    except:
        return arg

def rcollect_best(expr, measure=sympy.count_ops):
    # This method performs collect_best recursively on the collected terms
    best = collect_best(expr, measure)
    best_score = measure(best)
    if expr == best:
        return best
    if isinstance(best, sympy.Mul):
        return product(map(rcollect_best, best.args))
    if isinstance(best, sympy.Add):
        return sum(map(rcollect_best, best.args))

To illustrate the performance, this paper(paywalled, sorry) has 7 formulae that are 5th degree polynomials in 7 variables with up to 29 terms and 158 operations in the expanded forms. After applying both rcollect_best and @smichr's iflfactor, the number of operations in the 7 formulae are:

[6, 15, 100, 68, 39, 13, 2]

and

[32, 37, 113, 73, 40, 15, 2]

respectively. iflfactor has 433% more operations than rcollect_best for one of the formulae. Also, the number of operations in the expanded formulae are:

[39, 49, 158, 136, 79, 27, 2]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!