compute_totals takes longer with analytical gradient vs complex step

问题

below is the code snippet for a single component problem.

setting self.flag --> 1  uses complex step    
setting self.flag --> 0  uses analytical gradients

to approximate/compute partial derivatives.

The computational time required for computing total derivatives with option 1 is around 20 sec and option 0 is around 60 sec.

I was expecting somehow the opposite because there are thousands of 'compute' function calls with complex step.

I checked the function calls and they seem correct. I checked the analytical partials with 'cs' they also seem correct.

Can anyone enlighten me why computing total derivative via analytical partial derivative takes longer?

import time
import numpy as np
dim1,dim2,dim3=10,40,30
ran1=np.random.random([dim1,dim2,dim3])*5
ran2=np.random.random([dim1,dim2,dim3])*10

from openmdao.api import Problem, Group, IndepVarComp, ExplicitComponent

class FDPartialComp(ExplicitComponent):

    def setup(self):
        dim1,dim2,dim3=10,40,30
        self.add_input('var1', val=np.ones([dim1,dim2,dim3]))
        self.add_input('var2', val=np.ones([dim1,dim2,dim3]))
        self.add_output('f', shape=(dim1,))
        self.flag=0 
        self.cou=0
        self.partcou=0
        if self.flag:
            self.declare_partials('*', '*', method='cs')
        else:
            self.declare_partials('f', 'var1',cols=np.arange(dim2*dim3*dim1),rows=np.repeat(np.arange(dim1),dim2*dim3))
            self.declare_partials('f', 'var2' ,cols=np.arange(dim2*dim3*dim1),rows=np.repeat(np.arange(dim1),dim2*dim3))

    def compute(self, inputs, outputs):
        self.cou+=1
        print(self.cou)
        var1 = inputs['var1']
        var2 = inputs['var2']
        m=3
        outputs['f'] = np.sum((var2*var1**m),axis=(1,2))        

    def compute_partials(self, inputs, partials):
        if self.flag:
            pass
        else:
            m=3
            var1 = inputs['var1']
            var2 = inputs['var2']        
            partials['f','var1'] =(var1**m*m*var2/var1).flatten()
            partials["f","var2" ]= (var1**m).flatten()
            self.partcou+=1
            print(self.partcou)

model = Group()
comp = IndepVarComp()

comp.add_output('var1', ran1)
comp.add_output('var2', ran2)
#comp.add_output('var1', np.ones([dim1,dim2,dim3])*5)
#comp.add_output('var2', np.ones([dim1,dim2,dim3])*10)
model.add_subsystem('input', comp,promotes=['*'])
model.add_subsystem('example', FDPartialComp(),promotes=['*'])

problem = Problem(model=model)
problem.setup(check=True)
#problem.run_model()
st=time.time()
totals = problem.compute_totals(['f'], ['var1','var2'])
#problem.setup(force_alloc_complex=True)
#problem.check_partials(compact_print=True,method='cs')
print(time.time()-st)

FOLLOWING THE ANSWER i ADDED A SNAPSHOT FOR COMPUTATIONAL TIME SPENT AT VARIOUS PARTS OF THE CODE

回答1:

I took a look at the OpenMDAO code in order to figure out why CS without the direct solver is running as fast as analytical derivatives without the direct solver. In the process of doing that I found a few places where we were using numpy.add.at calls interally, and those calls are quite slow. I replaced those calls with much faster calls to numpy.bincount. The numbers shown here are using those code improvements, which have now been merged into the OpenMDAO master branch as of commit 7f13fda.These improvements will be released in V2.9.

After the recent changes to OpenMDAO, I get the following timings:

analytical derivs w/o direct solver (fwd mode):  13.55 s
analytical derivs with direct solver (fwd mode): 27.02 s
CS w/o direct solver (fwd mode):                 15.76 s

Note that now, analytical derivatives without the DirectSolver are in fact faster than CS, but if we look a little deeper with the help of a profiler we see something interesting.

solve_linear time (analytical):  12.65000 s 
linearize time (analytical):    + 0.00195 s   (1 call to compute_partials)
                                 ----------
                                 12.65195 s


solve_linear time (CS):           9.63 s 
linearize time (CS):            + 4.81 s    (24,000 compute calls)
                                 -------
                                 14.44 s

So solve_linear is still faster under CS. The reason for that is that for CS, the partials are declared as dense (this is currently the only way to do it since we don't yet support declaring sparse partials when using FD or CS). When the partials are declared as dense, matrix-vector products internal to solve_linear are done using a fast numpy.dot call, but when partials are declared as sparse, as is the case in your example when using analytical derivatives, then we use a slower function. At the time you ran your timings, we were using numpy.add.at, which, as mentioned above, is really slow. We're now using numpy.bincount, which is much faster, but still not as fast as numpy.dot, so that's the difference.

As an aside, since your total jacobian in this case has a shape of (10 x 24000), I would highly recommend using rev mode instead of fwd mode, so you'll be doing 10 linear solves instead of 24000. When I do that, I get these timings:

analytical derivs w/o direct solver (rev mode):  0.01 s
analytical derivs with direct solver (rev mode): 0.04 s
CS w/o direct solver (rev mode):                 4.86 s

The analytical derivative case is now clearly the winner.

Note that now the timing for the CS case is almost entirely due to the time spent in linearize, which takes the same amount of time as in fwd mode since the number of CS nonlinear solves is always determined by the number of columns in the partial jacobian.

回答2:

The performance difference you are seeing has to do with internal data structures in OpenMDAO. You're model, when analytic derivatives are given, is specified using a sparse format (this is good, since its very sparse!). But to really take advantage of that you need to use an assembled matrix format for the partial derivative data storage and a direct solver to compute a sparse LU factorization. Once you add those features to your model, the performance for analytic is better than with CS.

The discrepancy comes because when you use pure CS, you are storing the derivatives into a dense format which behaves as an assembled matrix. But when you specified the analytic derivatives, you didn't get that benefit by default. so there was some underlying differences in how the framework processed each case.

Here is an updated script that shows correct performance (i made the input smaller size so it runs faster)

import time
import numpy as np

# dim1,dim2,dim3=10,40,30
dim1,dim2,dim3=10,40,5

ran1=np.random.random([dim1,dim2,dim3])*5
ran2=np.random.random([dim1,dim2,dim3])*10

from openmdao.api import Problem, Group, IndepVarComp, ExplicitComponent, DirectSolver

class FDPartialComp(ExplicitComponent):

    def setup(self):

        self.add_input('var1', val=np.ones([dim1,dim2,dim3]))
        self.add_input('var2', val=np.ones([dim1,dim2,dim3]))
        self.add_output('f', shape=(dim1,))
        self.flag=0
        self.cou=0
        self.partcou=0

        if self.flag:
            self.declare_partials('*', '*', method='cs')
        else:
            self.declare_partials('f', 'var1',cols=np.arange(dim2*dim3*dim1),rows=np.repeat(np.arange(dim1),dim2*dim3))
            self.declare_partials('f', 'var2' ,cols=np.arange(dim2*dim3*dim1),rows=np.repeat(np.arange(dim1),dim2*dim3))

    def compute(self, inputs, outputs):
        self.cou+=1
        # print(self.cou)
        var1 = inputs['var1']
        var2 = inputs['var2']
        m=3
        outputs['f'] = np.sum((var2*var1**m),axis=(1,2))

    def compute_partials(self, inputs, partials):
        if self.flag:
            pass
        else:
            m=3
            var1 = inputs['var1']
            var2 = inputs['var2']
            partials['f','var1'] = (var1**m*m*var2/var1).flatten()
            partials['f','var2' ]= (var1**m).flatten()
            self.partcou+=1
            # print(self.partcou)

model = Group()
comp = IndepVarComp()

comp.add_output('var1', ran1)
comp.add_output('var2', ran2)
#comp.add_output('var1', np.ones([dim1,dim2,dim3])*5)
#comp.add_output('var2', np.ones([dim1,dim2,dim3])*10)
model.add_subsystem('input', comp,promotes=['*'])
model.add_subsystem('example', FDPartialComp(),promotes=['*'])


model.linear_solver = DirectSolver(assemble_jac=True)

problem = Problem(model=model)
problem.setup(check=True, mode='fwd')

problem.final_setup()

# exit()
#problem.run_model()
st=time.time()
totals = problem.compute_totals(['f'], ['var1','var2'])
#problem.check_partials(compact_print=True,method='cs')
print(time.time()-st)
print(problem._mode)

来源：https://stackoverflow.com/questions/53947177/compute-totals-takes-longer-with-analytical-gradient-vs-complex-step

标签

openmdao