Python baseline correction library

问题

I am currently working with some Raman Spectra data, and I am trying to correct my data caused by florescence skewing. Take a look at the graph below:

I am pretty close to achieving what I want. As you can see, I am trying to fit a polynomial in all my data whereas I should really just be fitting a polynomial at the local minimas.

Ideally I would want to have a polynomial fitting which when subtracted from my original data would result in something like this:

Are there any built in libs that does this already?

If not, any simple algorithm one can recommend for me?

回答1:

I found an answer to my question, just sharing for everyone who stumbles upon this.

There is an algorithm called "Asymmetric Least Squares Smoothing" by P. Eilers and H. Boelens in 2005. The paper is free and you can find it on google.

def baseline_als(y, lam, p, niter=10):
  L = len(y)
  D = sparse.csc_matrix(np.diff(np.eye(L), 2))
  w = np.ones(L)
  for i in xrange(niter):
    W = sparse.spdiags(w, 0, L, L)
    Z = W + lam * D.dot(D.transpose())
    z = spsolve(Z, w*y)
    w = p * (y > z) + (1-p) * (y < z)
  return z

回答2:

The following code works on Python 3.6.

This is adapted from the accepted correct answer to avoid the dense matrix diff computation (which can easily cause memory issues) and uses range (not xrange)

import numpy as np
from scipy import sparse
from scipy.sparse.linalg import spsolve

def baseline_als(y, lam, p, niter=10):
  L = len(y)
  D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
  w = np.ones(L)
  for i in range(niter):
    W = sparse.spdiags(w, 0, L, L)
    Z = W + lam * D.dot(D.transpose())
    z = spsolve(Z, w*y)
    w = p * (y > z) + (1-p) * (y < z)
  return z

回答3:

I know this is an old question, but I stumpled upon it a few months ago and implemented the equivalent answer using spicy.sparse routines.

# Baseline removal                                                                                            

def baseline_als(y, lam, p, niter=10):                                                                        

    s  = len(y)                                                                                               
    # assemble difference matrix                                                                              
    D0 = sparse.eye( s )                                                                                      
    d1 = [numpy.ones( s-1 ) * -2]                                                                             
    D1 = sparse.diags( d1, [-1] )                                                                             
    d2 = [ numpy.ones( s-2 ) * 1]                                                                             
    D2 = sparse.diags( d2, [-2] )                                                                             

    D  = D0 + D2 + D1                                                                                         
    w  = np.ones( s )                                                                                         
    for i in range( niter ):                                                                                  
        W = sparse.diags( [w], [0] )                                                                          
        Z =  W + lam*D.dot( D.transpose() )                                                                   
        z = spsolve( Z, w*y )                                                                                 
        w = p * (y > z) + (1-p) * (y < z)                                                                     

    return z

Cheers,

Pedro.

回答4:

Recently, I needed to use this method. The code from answers works well, but it obviously overuses the memory. So, here is my version with optimized memory usage.

def baseline_als_optimized(y, lam, p, niter=10):
    L = len(y)
    D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
    D = lam * D.dot(D.transpose()) # Precompute this term since it does not depend on `w`
    w = np.ones(L)
    W = sparse.spdiags(w, 0, L, L)
    for i in range(niter):
        W.setdiag(w) # Do not create a new matrix, just update diagonal values
        Z = W + D
        z = spsolve(Z, w*y)
        w = p * (y > z) + (1-p) * (y < z)
    return z

According to my benchmarks bellow, it is also about 1,5 times faster.

%%timeit -n 1000 -r 10 y = randn(1000)
baseline_als(y, 10000, 0.05) # function from @jpantina's answer
# 20.5 ms ± 382 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

%%timeit -n 1000 -r 10 y = randn(1000)
baseline_als_optimized(y, 10000, 0.05)
# 13.3 ms ± 874 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

NOTE 1: The original article says:

To emphasize the basic simplicity of the algorithm, the number of iterations has been fixed to 10. In practical applications one should check whether the weights show any change; if not, convergence has been attained.

So, it means that the more correct way to stop iteration is to check that ||w_new - w|| < tolerance

NOTE 2: Another useful quote (from @glycoaddict's comment) gives an idea how to choose values of the parameters.

There are two parameters: p for asymmetry and λ for smoothness. Both have to be tuned to the data at hand. We found that generally 0.001 ≤ p ≤ 0.1 is a good choice (for a signal with positive peaks) and 102 ≤ λ ≤ 109, but exceptions may occur. In any case one should vary λ on a grid that is approximately linear for log λ. Often visual inspection is sufficient to get good parameter values.

来源：https://stackoverflow.com/questions/29156532/python-baseline-correction-library

标签

python

numpy

scipy

signal-processing