Simple multidimensional curve fitting

前端 未结 6 1888
天涯浪人
天涯浪人 2020-12-24 08:52

I have a bunch of data, generally in the form a, b, c, ..., y

where y = f(a, b, c...)

Most of them are three and four variables, and have 10k - 10M records.

相关标签:
6条回答
  • 2020-12-24 09:26

    Do you know to what power you want to limit your polynomial?

    If there is no limit, then you can always get an exact match for N points by matching it to a polynomial that has N coefficients. To do this, you plug N different points into your equation, yielding N equations and N unknowns (the coefficients), which you can then use either simple high school algebra or a matrix to solve for the unknowns.

    0 讨论(0)
  • 2020-12-24 09:29

    There's a tool for fitting 1D and 2D curves at zunzun.com, but I don't think it goes beyond two variables. Likewise, Matlab doesn't support more than two dimensions fitting (as far as I know) and it's certainly not free.

    Otherwise, you might be able to find part of your solution in the Numerical Recipes.

    But as other posters indicated, you'll probably need at least a basic idea of your function model (which, hopefully, is linear or can be linearized, in which case you'll have a much larger array of solutions at your disposal)

    0 讨论(0)
  • 2020-12-24 09:34

    The basics of data fitting involve assuming a general form of a solution, guessing some initial values for constants, and then iterating to minimize the error of the guessed solution to find a specific solution, usually in the least-squares sense.

    Look into R or Octave for open source tools. They are both capable of least-squares analysis, with several tutorials just a Google search away.

    Edit: Octave code for estimating the coefficients for a 2nd order polynomial

    x = 0:0.1:10;
    y = 5.*x.^2 + 4.*x + 3;
    
    % Add noise to y data
    y = y + randn(size(y))*0.1;
    
    % Estimate coefficients of polynomial
    p = polyfit(x,y,2)
    

    On my machine, I get:

    ans =
    
       5.0886   3.9050   2.9577
    
    0 讨论(0)
  • 2020-12-24 09:46

    If you have a guess at the form of f,[*] you need a minimizer to find the optimal parameters. The tools Scottie T suggests would work, as would ROOT, and many others.

    If you don't have a clue what form f might take you are in deep trouble indeed.


    [*] That is, you know that

    f = f(x,y,z,w,...;p1,p2,p3...)

    where the ps are parameters and the coordinates are x, y...

    0 讨论(0)
  • 2020-12-24 09:46

    Short Answer: it isn't so simple. Consider a non-parametric approach on data sub-sets.

    There are 2 main issues you need to decide about (1) Do you actually care about the parameters of the function, i.e. your P1, E1, ..., or would you be okay with just estimating the mean function (2) do you really need to estimate the function on all of the data?

    The first thing I'll mention is that your specified function is non-linear (in the parameters to be estimated), so ordinary least squares won't work. Let's pretend that you specified a linear function. You'd still have a problem with the 10M values. Linear regression can be performed in an efficient way using QR factorization, but you are still left with an O(p * n^2) algorithm, where p is the number of parameters you are trying to estimate. If you want to estimate the non-linear mean function it gets much worse.

    The only way you are going to be able to estimate anything in such a large data set is by using a subset to perform the estimation. Basically, you randomly select a subset and use that to estimate the function.

    If you don't care about your parameter values, and just want to estimate the mean function you will probably be better off using a non-parametric estimation technique.

    Hopefully this helps.

    leif

    0 讨论(0)
  • 2020-12-24 09:47

    In case it's useful, here's a Numpy/Scipy (Python) template to do what you want:

    from numpy import array
    from scipy.optimize import leastsq
    
    def __residual(params, y, a, b, c):
        p0, e0, p1, e1, p2, e2 = params
        return p0 * a ** e0 + p1 * b ** e1 + p2 * c ** e2 - y
    
    # load a, b, c
    # guess initial values for p0, e0, p1, e1, p2, e2
    p_opt = leastsq(__residual,  array([p0, e0, p1, e1, p2, e2]), args=(y, a, b, c))
    print 'y = %f a^%f + %f b^%f %f c^%f' % map(float, p_opt)
    

    If you really want to understand what's going on, though, you're going to have to invest the time to scale the learning curve for some tool or programming environment - I really don't think there's any way around that. People don't generally write specialized tools for doing things like 3-term power regressions exclusively.

    0 讨论(0)
提交回复
热议问题