Python power law fit with upper limits & asymmetric errors in data using ODR

后端 未结 1 618
旧巷少年郎
旧巷少年郎 2021-01-02 18:19

I\'m trying to fit some data to a power law using python. The problem is that some of my points are upper limits, which I don\'t know how to include in the fitting routine.<

相关标签:
1条回答
  • 2021-01-02 18:31

    This answer is related to this post, where I discuss fitting with x and y errors. This, hence does not require the ODR module, but can be done manually. Therefore, one can use leastsq or minimize. Concerning the constraints, I made clear in other posts that I try to avoid them if possible. This can be done here as well, although the details of programming and maths are a little cumbersome, especially if it is supposed to be stable and foolproof. I will just give a rough idea. Say we want y0 > m * x0**(-c). In log-form we can write this as eta0 > mu - c * xeta0. I.e. there is an alpha such that eta0 = mu - c * xeta0 + alpha**2. Same for the other inequalities. For the second upper limit you get a beta**2 but you can decide which one is the smaller one, so you automatically fulfil the other condition. Same thing works for the lower limits with a gamma**2 and a delta**2. Say we can work with alpha and gamma. We can combine the inequality conditions to relate those two as well. At the end we can fit a sigma and alpha = sqrt(s-t)* sigma / sqrt( sigma**2 + 1 ), where s and t are derived from the inequalities. The sigma / sqrt( sigma**2 + 1 ) function is just one option to let alpha vary in a certain range, i.e. alpha**2 < s-t The fact that the radicand may become negative, shows that there are cases without solution. With alpha known, mu and, therefore m are calculated. So fit parameters are c and sigma, which takes the inequalities into account and makes m depended. I tired it and it works, but the version at hand is not the most stable one. I'd post it upon request.

    As we have a handmade residual function already, we have a second option, though. We just introduce our own chi**2 function and use minimize, whic allows constraints. As minimize and the constraints keyword solution are very flexible and the residual function is easily modified for other functions and not only for m * x**( -c ) the overall construction is quite flexible. It looks as follows:

    import matplotlib.pyplot as plt
    import numpy as np
    from random import random, seed
    from scipy.optimize import minimize,leastsq
    
    seed(7563)
    fig1 = plt.figure(1)
    
    
    ###for gaussion distributed errors
    def boxmuller(x0,sigma):
        u1=random()
        u2=random()
        ll=np.sqrt(-2*np.log(u1))
        z0=ll*np.cos(2*np.pi*u2)
        z1=ll*np.cos(2*np.pi*u2)
        return sigma*z0+x0, sigma*z1+x0
    
    
    ###for plotting ellipses
    def ell_data(a,b,x0=0,y0=0):
        tList=np.linspace(0,2*np.pi,150)
        k=float(a)/float(b)
        rList=[a/np.sqrt((np.cos(t))**2+(k*np.sin(t))**2) for t in tList]
        xyList=np.array([[x0+r*np.cos(t),y0+r*np.sin(t)] for t,r in zip(tList,rList)])
        return xyList
    
    ###function to fit
    def f(x,m,c):
        y = abs(m) * abs(x)**(-abs(c))
        #~ print y,x,m,c
        return y
    
    
    ###how to rescale the ellipse to make fitfunction a tangent
    def elliptic_rescale(x, m, c, x0, y0, sa, sb):
        #~ print "e,r",x,m,c
        y=f( x, m, c ) 
        #~ print "e,r",y
        r=np.sqrt( ( x - x0 )**2 + ( y - y0 )**2 )
        kappa=float( sa ) / float( sb )
        tau=np.arctan2( y - y0, x - x0 )
        new_a=r*np.sqrt( np.cos( tau )**2 + ( kappa * np.sin( tau ) )**2 )
        return new_a
    
    ###residual function to calculate chi-square
    def residuals(parameters,dataPoint):#data point is (x,y,sx,sy)
        m, c = parameters
        #~ print "m c", m, c
        theData = np.array(dataPoint)
        best_t_List=[]
        for i in range(len(dataPoint)):
            x, y, sx, sy = dataPoint[i][0], dataPoint[i][1], dataPoint[i][2], dataPoint[i][3]
            #~ print "x, y, sx, sy",x, y, sx, sy
            ###getthe point on the graph where it is tangent to an error-ellipse
            ed_fit = minimize( elliptic_rescale, x , args = ( m, c, x, y, sx, sy ) )
            best_t = ed_fit['x'][0]
            best_t_List += [best_t]
            #~ exit(0)
        best_y_List=[ f( t, m, c ) for t in best_t_List ]
        ##weighted distance not squared yet, as this is done by scipy.optimize.leastsq
        wighted_dx_List = [ ( x_b - x_f ) / sx for x_b, x_f, sx in zip( best_t_List,theData[:,0], theData[:,2] ) ]
        wighted_dy_List = [ ( x_b - x_f ) / sx for x_b, x_f, sx in zip( best_y_List,theData[:,1], theData[:,3] ) ]
        return wighted_dx_List + wighted_dy_List
    
    
    def chi2(params, pnts):  
        r = np.array( residuals( params, pnts ) )
        s = sum( [ x**2 for x in  r]  )
        #~ print params,s,r
        return s
    
    
    def myUpperIneq(params,pnt):
        m, c = params
        x,y=pnt
        return y - f( x, m, c )
    
    
    def myLowerIneq(params,pnt):
        m, c = params
        x,y=pnt
        return f( x, m, c ) - y
    
    
    ###to create some test data
    def test_data(m,c, xList,const_sx,rel_sx,const_sy,rel_sy):
        yList=[f(x,m,c) for x in xList]
        xErrList=[ boxmuller(x,const_sx+x*rel_sx)[0] for x in xList]
        yErrList=[ boxmuller(y,const_sy+y*rel_sy)[0] for y in yList]
        return xErrList,yErrList
    
    
    ###some start values
    mm_0=2.3511
    expo_0=.3588
    csx,rsx=.01,.07
    csy,rsy=.04,.09,
    
    limitingPoints=dict()
    limitingPoints[0]=np.array([[.2,5.4],[.5,5.0],[5.1,.9],[5.7,.9]])
    limitingPoints[1]=np.array([[.2,5.4],[.5,5.0],[5.1,1.5],[5.7,1.2]])
    limitingPoints[2]=np.array([[.2,3.4],[.5,5.0],[5.1,1.1],[5.7,1.2]])
    limitingPoints[3]=np.array([[.2,3.4],[.5,5.0],[5.1,1.7],[5.7,1.2]])
    
    ####some data
    xThData=np.linspace(.2,5,15)
    yThData=[ f(x, mm_0, expo_0) for x in xThData]
    
    #~ ###some noisy data
    xNoiseData,yNoiseData=test_data(mm_0,  expo_0, xThData, csx,rsx, csy,rsy)
    xGuessdError=[csx+rsx*x for x in xNoiseData]
    yGuessdError=[csy+rsy*y for y in yNoiseData]
    
    
    
    for testing in range(4):
        ###Now fitting with limits
        zipData=zip(xNoiseData,yNoiseData, xGuessdError, yGuessdError)    
        estimate = [ 2.4, .3 ]
        con0={'type': 'ineq', 'fun': myUpperIneq, 'args': (limitingPoints[testing][0],)}
        con1={'type': 'ineq', 'fun': myUpperIneq, 'args': (limitingPoints[testing][1],)}
        con2={'type': 'ineq', 'fun': myLowerIneq, 'args': (limitingPoints[testing][2],)}
        con3={'type': 'ineq', 'fun': myLowerIneq, 'args': (limitingPoints[testing][3],)}
        myResult = minimize( chi2 , estimate , args=( zipData, ), constraints=[ con0, con1, con2, con3 ]  )
        print "############"
        print myResult
    
    
        ###plot that
        ax=fig1.add_subplot(4,2,2*testing+1)
        ax.plot(xThData,yThData)
        ax.errorbar(xNoiseData,yNoiseData, xerr=xGuessdError, yerr=yGuessdError, fmt='none',ecolor='r')
    
    
        testX = np.linspace(.2,6,25)
        testY = np.fromiter( ( f( x, myResult.x[0], myResult.x[1] ) for x in testX ), np.float)
    
        bx=fig1.add_subplot(4,2,2*testing+2)
        bx.plot(xThData,yThData)
        bx.errorbar(xNoiseData,yNoiseData, xerr=xGuessdError, yerr=yGuessdError, fmt='none',ecolor='r')
        ax.plot(limitingPoints[testing][:,0],limitingPoints[testing][:,1],marker='x', linestyle='')
        bx.plot(limitingPoints[testing][:,0],limitingPoints[testing][:,1],marker='x', linestyle='')
        ax.plot(testX, testY, linestyle='--')
        bx.plot(testX, testY, linestyle='--')
    
        bx.set_xscale('log')
        bx.set_yscale('log')
    
    plt.show()
    

    Providing results

    ############
      status: 0
     success: True
        njev: 8
        nfev: 36
         fun: 13.782127248002116
           x: array([ 2.15043226,  0.35646436])
     message: 'Optimization terminated successfully.'
         jac: array([-0.00377715,  0.00350225,  0.        ])
         nit: 8
    ############
      status: 0
     success: True
        njev: 7
        nfev: 32
         fun: 41.372277637885716
           x: array([ 2.19005695,  0.23229378])
     message: 'Optimization terminated successfully.'
         jac: array([ 123.95069313, -442.27114677,    0.        ])
         nit: 7
    ############
      status: 0
     success: True
        njev: 5
        nfev: 23
         fun: 15.946621924326545
           x: array([ 2.06146362,  0.31089065])
     message: 'Optimization terminated successfully.'
         jac: array([-14.39131606, -65.44189298,   0.        ])
         nit: 5
    ############
      status: 0
     success: True
        njev: 7
        nfev: 34
         fun: 88.306027468763432
           x: array([ 2.16834392,  0.14935514])
     message: 'Optimization terminated successfully.'
         jac: array([ 224.11848736, -791.75553417,    0.        ])
         nit: 7
    

    I checked four different limiting points (rows). The result are displayed normally and in logarithmic scale (columns). With some additional work you could get errors as well.

    Update on asymmetric errors To be honest, at the moment I do not know how to handle this property. Naively, I'd define my own asymmetric loss function similar to this post. With x and y errors I do it by quadrant instead of just checking positive or negative side. My error ellipse, hence, changes to four connected pieces. Nevertheless, it is somewhat reasonable. For testing and to show how it works, I made an example with a linear function. I guess the OP can combine the two pieces of code according to his requirements.

    In case of a linear fit it looks like this:

    import matplotlib.pyplot as plt
    import numpy as np
    from random import random, seed
    from scipy.optimize import minimize,leastsq
    
    #~ seed(7563)
    fig1 = plt.figure(1)
    ax=fig1.add_subplot(2,1,1)
    bx=fig1.add_subplot(2,1,2)
    
    ###function to fit, here only linear for testing.
    def f(x,m,y0):
        y = m * x +y0
        return y
    
    ###for gaussion distributed errors
    def boxmuller(x0,sigma):
        u1=random()
        u2=random()
        ll=np.sqrt(-2*np.log(u1))
        z0=ll*np.cos(2*np.pi*u2)
        z1=ll*np.cos(2*np.pi*u2)
        return sigma*z0+x0, sigma*z1+x0
    
    
    ###for plotting ellipse quadrants
    def ell_data(aN,aP,bN,bP,x0=0,y0=0):
        tPPList=np.linspace(0, 0.5 * np.pi, 50)
        kPP=float(aP)/float(bP)
        rPPList=[aP/np.sqrt((np.cos(t))**2+(kPP*np.sin(t))**2) for t in tPPList]
    
        tNPList=np.linspace( 0.5 * np.pi, 1.0 * np.pi, 50)
        kNP=float(aN)/float(bP)
        rNPList=[aN/np.sqrt((np.cos(t))**2+(kNP*np.sin(t))**2) for t in tNPList]
    
        tNNList=np.linspace( 1.0 * np.pi, 1.5 * np.pi, 50)
        kNN=float(aN)/float(bN)
        rNNList=[aN/np.sqrt((np.cos(t))**2+(kNN*np.sin(t))**2) for t in tNNList]
    
        tPNList = np.linspace( 1.5 * np.pi, 2.0 * np.pi, 50)
        kPN = float(aP)/float(bN)
        rPNList = [aP/np.sqrt((np.cos(t))**2+(kPN*np.sin(t))**2) for t in tPNList]
    
        tList = np.concatenate( [ tPPList, tNPList, tNNList, tPNList] )
        rList = rPPList + rNPList+ rNNList + rPNList
    
        xyList=np.array([[x0+r*np.cos(t),y0+r*np.sin(t)] for t,r in zip(tList,rList)])
        return xyList
    
    
    ###how to rescale the ellipse to touch fitfunction at point (x,y)
    def elliptic_rescale_asymmetric(x, m, c, x0, y0, saN, saP, sbN, sbP , getQuadrant=False):
        y=f( x, m, c ) 
        ###distance to function
        r=np.sqrt( ( x - x0 )**2 + ( y - y0 )**2 )
        ###angle to function
        tau=np.arctan2( y - y0, x - x0 )
        quadrant=0
        if tau >0:
            if tau < 0.5 * np.pi: ## PP
                kappa=float( saP ) / float( sbP )
                quadrant=1
            else:
                kappa=float( saN ) / float( sbP )
                quadrant=2
        else:
            if tau < -0.5 * np.pi: ## PP
                kappa=float( saN ) / float( sbN)
                quadrant=3
            else:
                kappa=float( saP ) / float( sbN )
                quadrant=4
        new_a=r*np.sqrt( np.cos( tau )**2 + ( kappa * np.sin( tau ) )**2 )
        if quadrant == 1 or quadrant == 4:
            rel_a=new_a/saP
        else:
            rel_a=new_a/saN
        if getQuadrant:
            return rel_a, quadrant, tau
        else:
            return rel_a
    
    ### residual function to calculate chi-square
    def residuals(parameters,dataPoint):#data point is (x,y,sxN,sxP,syN,syP)
        m, c = parameters
        theData = np.array(dataPoint)
        bestTList=[]
        qqList=[]
        weightedDistanceList = []
        for i in range(len(dataPoint)):
            x, y, sxN, sxP, syN, syP = dataPoint[i][0], dataPoint[i][1], dataPoint[i][2], dataPoint[i][3], dataPoint[i][4], dataPoint[i][5]
            ### get the point on the graph where it is tangent to an error-ellipse
            ### i.e. smallest ellipse touching the graph
            edFit = minimize(  elliptic_rescale_asymmetric, x , args = ( m, c, x, y, sxN, sxP, syN, syP ) )
            bestT = edFit['x'][0]
            bestTList += [ bestT ]
            bestA,qq , tau= elliptic_rescale_asymmetric( bestT, m, c , x, y, aN, aP, bN, bP , True)
            qqList += [ qq ]
        bestYList=[ f( t, m, c ) for t in bestTList ]
        ### weighted distance not squared yet, as this is done by scipy.optimize.leastsq or manual chi2 function
        for counter in range(len(dataPoint)):
            xb=bestTList[counter]
            xf=dataPoint[counter][0]
            yb=bestYList[counter]
            yf=dataPoint[counter][1]
            quadrant=qqList[counter]
            if quadrant == 1:
                sx, sy = sxP, syP
            elif quadrant == 2:
                sx, sy = sxN, syP
            elif quadrant == 3:
                sx, sy = sxN, syN
            elif quadrant == 4:
                sx, sy = sxP, syN
            else:
                assert 0
            weightedDistanceList += [ ( xb - xf ) / sx, ( yb - yf ) / sy ]
        return weightedDistanceList
    
    
    def chi2(params, pnts):  
        r = np.array( residuals( params, pnts ) )
        s = np.fromiter( ( x**2 for x in  r), np.float ).sum()
        return s
    
    ####...to make data with asymmetric error (fixed); for testing only
    def noisy_data(xList,m0,y0, sxN,sxP,syN,syP):
        yList=[ f(x, m0, y0) for x in xList]
        gNList=[boxmuller(0,1)[0] for dummy in range(len(xList))]
        xerrList=[]
        for x,err in zip(xList,gNList):
            if err < 0:
                xerrList += [ sxP * err + x ]
            else:
                xerrList += [ sxN * err + x ]
        gNList=[boxmuller(0,1)[0] for dummy in range(len(xList))]
        yerrList=[]
        for y,err in zip(yList,gNList):
            if err < 0:
                yerrList += [ syP * err + y ]
            else:
                yerrList += [ syN * err + y ]
        return xerrList, yerrList
    
    
    ###some start values
    m0=1.3511
    y0=-2.2
    aN, aP, bN, bP=.2,.5, 0.9, 1.6
    
    #### some data
    xThData=np.linspace(.2,5,15)
    yThData=[ f(x, m0, y0) for x in xThData]
    xThData0=np.linspace(-1.2,7,3)
    yThData0=[ f(x, m0, y0) for x in xThData0]
    
    ### some noisy data
    xErrList,yErrList = noisy_data(xThData, m0, y0, aN, aP, bN, bP)
    
    ###...and the fit
    dataToFit=zip(xErrList,yErrList,  len(xThData)*[aN], len(xThData)*[aP], len(xThData)*[bN], len(xThData)*[bP])
    fitResult = minimize(chi2, (m0,y0) , args=(dataToFit,) )
    fittedM, fittedY=fitResult.x
    yThDataF=[ f(x, fittedM, fittedY) for x in xThData0]
    
    
    ### plot that
    for cx in [ax,bx]:
        cx.plot([-2,7], [f(x, m0, y0 ) for x in [-2,7]])
    
    ax.errorbar(xErrList,yErrList, xerr=[ len(xThData)*[aN],len(xThData)*[aP] ], yerr=[ len(xThData)*[bN],len(xThData)*[bP] ], fmt='ro')
    
    for x,y in zip(xErrList,yErrList)[:]:
        xEllList,yEllList = zip( *ell_data(aN,aP,bN,bP,x,y) )
        ax.plot(xEllList,yEllList ,c='#808080')
        ### rescaled
        ### ...as well as a scaled version that touches the original graph. This gives the error shortest distance to that graph
        ed_fit = minimize( elliptic_rescale_asymmetric, 0 ,args=(m0, y0, x, y, aN, aP, bN, bP ) )
        best_t = ed_fit['x'][0]
        best_a,qq , tau= elliptic_rescale_asymmetric( best_t, m0, y0 , x, y, aN, aP, bN, bP , True)
        xEllList,yEllList = zip( *ell_data( aN * best_a, aP * best_a, bN * best_a, bP * best_a, x, y) )
        ax.plot( xEllList, yEllList, c='#4040a0' )
    
    ###plot the fit
    
    bx.plot(xThData0,yThDataF)
    bx.errorbar(xErrList,yErrList, xerr=[ len(xThData)*[aN],len(xThData)*[aP] ], yerr=[ len(xThData)*[bN],len(xThData)*[bP] ], fmt='ro')
    for x,y in zip(xErrList,yErrList)[:]:
        xEllList,yEllList = zip( *ell_data(aN,aP,bN,bP,x,y) )
        bx.plot(xEllList,yEllList ,c='#808080')
        ####rescaled
        ####...as well as a scaled version that touches the original graph. This gives the error shortest distance to that graph
        ed_fit = minimize( elliptic_rescale_asymmetric, 0 ,args=(fittedM, fittedY, x, y, aN, aP, bN, bP ) )
        best_t = ed_fit['x'][0]
        #~ print best_t
        best_a,qq , tau= elliptic_rescale_asymmetric( best_t, fittedM, fittedY , x, y, aN, aP, bN, bP , True)
        xEllList,yEllList = zip( *ell_data( aN * best_a, aP * best_a, bN * best_a, bP * best_a, x, y) )
        bx.plot( xEllList, yEllList, c='#4040a0' )
    
    plt.show()
    

    which plots

    The upper graph shows the original linear function and some data generated from this using asymmetric Gaussian errors. Error bars are plotted, as well as the piecewise error ellipses (grey...and rescaled to touch the linear function, blue). The lower graph additionally shows the fitted function as well as the rescaled piecewise ellipses, touching the fitted function.

    0 讨论(0)
提交回复
热议问题