Combine two Pandas dataframes, resample on one time column, interpolate

后端 未结 2 626
情深已故
情深已故 2021-01-01 15:33

This is my first question on stackoverflow. Go easy on me!

I have two data sets acquired simultaneously by different acquisition systems with different sampling rate

相关标签:
2条回答
  • 2021-01-01 15:36

    If you construct a single DataFrame from Series, using time values as index, like this:

    >>> t1 = np.array([0, 0.5, 1.0, 1.5, 2.0])
    >>> y1 = pd.Series(t1, index=t1)
    
    >>> t2 = np.array([0, 0.34, 1.01, 1.4, 1.6, 1.7, 2.01])
    >>> y2 = pd.Series(3*t2, index=t2)
    
    >>> df = pd.DataFrame({'y1': y1, 'y2': y2})
    >>> df
           y1    y2
    0.00  0.0  0.00
    0.34  NaN  1.02
    0.50  0.5   NaN
    1.00  1.0   NaN
    1.01  NaN  3.03
    1.40  NaN  4.20
    1.50  1.5   NaN
    1.60  NaN  4.80
    1.70  NaN  5.10
    2.00  2.0   NaN
    2.01  NaN  6.03
    

    You can simply interpolate it, and select only the part where y1 is defined:

    >>> df.interpolate('index').reindex(y1)
          y1   y2
    0.0  0.0  0.0
    0.5  0.5  1.5
    1.0  1.0  3.0
    1.5  1.5  4.5
    2.0  2.0  6.0
    
    0 讨论(0)
  • 2021-01-01 15:40

    It's not exactly clear to me how you're getting rid of some of the values in y2, but it seems like if there is more than one for a given timepoint, you only want the first one. Also, it seems like your time values should be in the index. I also added column labels. It looks like this:

    import pandas as pd
    
    # evenly spaced times
    t1 = [0,0.5,1.0,1.5,2.0]
    y1 = t1
    
    # unevenly spaced times
    t2 = [0,0.34,1.01,1.4,1.6,1.7,2.01]
    
    # round t2 values to the nearest half
    new_t2 = [round(num * 2)/2 for num in t2]
    
    # set y2 values
    y2 = [3*z for z in new_t2]
    
    # eliminate entries that have the same index value
    for x in range(1, len(new_t2), -1):
        if new_t2[x] == new_t2[x-1]:
            new_t2.delete(x)
            y2.delete(x)
    
    
    ser1 = pd.Series(y1, index=t1)
    ser2 = pd.Series(y2, index=new_t2)
    
    df = pd.concat((ser1, ser2), axis=1)
    df.columns = ('Y1', 'Y2')
    
    print df
    

    This prints:

          Y1   Y2
    0.0  0.0  0.0
    0.5  0.5  1.5
    1.0  1.0  3.0
    1.5  1.5  4.5
    1.5  1.5  4.5
    1.5  1.5  4.5
    2.0  2.0  6.0
    
    0 讨论(0)
提交回复
热议问题