Matlab Dates mismatch between two sets of data. Help!

前端 未结 2 1471
囚心锁ツ
囚心锁ツ 2021-01-14 08:08

Matlab Dates mismatch between two sets of data. Help!

Please forgive the simplicity of the question but its my first day;

I am working with two sets of time

2条回答
  •  被撕碎了的回忆
    2021-01-14 08:54

    Basically you want to perform a full outer merge on two datasets based on the date as key.

    Consider the following as example:

    %# vector of dates (serial datetime)
    days = datenum( num2str((1:31)','2011-10-%02d') );   %'# one month (October 2011)
    
    %# lets build two datasets similar to what you described
    idx1 = rand(size(days)) > 0.2;                %# randomly pick dates for 1st
    M1 = [days(idx1) rand(sum(idx1),2)*1000];     %# sotcks: days,opening,closing
    idx2 = rand(size(days)) > 0.5;                %# randomly pick dates for 2nd
    M2 = [days(idx2) rand(sum(idx2),2)*1000];     %# bonds: days,opening,closing
    
    %# get the full range of dates, and convert them to indices starting at 1
    [allDays,~,ind] = unique( [M1(:,1);M2(:,1)] );
    indM1 = ind(1:size(M1,1));
    indM2 = ind(size(M1,1)+1:end);
    
    %# merge the two datasets (days,opening,closing,opening,closing)
    M = nan(numel(allDays),size(M1,2)+size(M2,2)-1);
    M(:,1) = allDays;                   %# available days from both
    M(indM1,2:3) = M1(:,2:3);           %# insert 1st dataset values
    M(indM2,4:5) = M2(:,2:3);           %# insert 2nd dataset values
    
    %# final merged dataset formatted
    C = [cellstr(datestr(M(:,1),'yyyy-mm-dd')) num2cell(M(:,2:end))]
    

    The result:

    C = 
        '2011-10-01'    [     NaN]    [     NaN]    [332.5714]    [241.5017]
        '2011-10-03'    [941.9189]    [ 86.8151]    [     NaN]    [     NaN]
        '2011-10-04'    [655.9138]    [429.3973]    [     NaN]    [     NaN]
        '2011-10-05'    [451.9457]    [257.2828]    [853.0636]    [243.1452]
        '2011-10-06'    [839.6974]    [297.5554]    [     NaN]    [     NaN]
        '2011-10-07'    [532.6235]    [424.8584]    [     NaN]    [     NaN]
        '2011-10-09'    [553.8871]    [119.2073]    [     NaN]    [     NaN]
        '2011-10-11'    [680.0655]    [495.0669]    [442.3979]    [154.1594]
        '2011-10-13'    [367.1899]    [706.4072]    [904.3555]    [956.4164]
        '2011-10-14'    [     NaN]    [     NaN]    [ 33.1794]    [935.6614]
        '2011-10-15'    [239.2906]    [243.5734]    [     NaN]    [     NaN]
        '2011-10-16'    [578.9235]    [785.0701]    [532.4265]    [818.7144]
        '2011-10-17'    [866.8871]    [ 74.0896]    [716.4973]    [728.2618]
        '2011-10-18'    [406.7768]    [393.8834]    [179.3018]    [175.8117]
        '2011-10-19'    [112.6151]    [  3.3941]    [336.5329]    [360.3710]
        '2011-10-20'    [443.8458]    [220.6769]    [     NaN]    [     NaN]
        '2011-10-21'    [     NaN]    [     NaN]    [187.7129]    [188.7900]
        '2011-10-22'    [300.1844]    [  1.3006]    [     NaN]    [     NaN]
        '2011-10-23'    [401.3869]    [189.1797]    [     NaN]    [     NaN]
        '2011-10-24'    [833.3636]    [142.4841]    [321.9272]    [  1.1984]
        '2011-10-25'    [     NaN]    [     NaN]    [403.8567]    [316.4195]
        '2011-10-26'    [403.6287]    [268.0760]    [     NaN]    [     NaN]
        '2011-10-27'    [390.1759]    [174.8921]    [     NaN]    [     NaN]
        '2011-10-28'    [     NaN]    [     NaN]    [548.5663]    [699.6170]
        '2011-10-29'    [360.4489]    [138.6490]    [ 48.7386]    [625.2552]
        '2011-10-30'    [140.2554]    [598.8856]    [552.7321]    [543.0622]
        '2011-10-31'    [260.1302]    [901.0579]    [274.8114]    [439.0372]
    

    The merged result contains opening/closing prices from both datasets. When one of them is not available on a specific date, it is replaced by NaN. Note how there are some unrepresented days in the result, this is because both datasets did not list prices on those days.


    Alternatively, you could look into the dataset class from the Statistics Toolbox (which is designed for such cases). Using the same example:

    %# build dataset object for the two sets
    varNames1 = {'days' 'stock_open' 'stock_close'};
    varNames2 = {'days' 'bond_open' 'bond_close'};
    d1 = dataset([M1, varNames1]);
    d2 = dataset([M2, varNames2]);
    
    %# join on days (full-outer join)
    d = join(d1,d2, 'keys','days', 'type','fullouter', 'MergeKeys',true);
    d.days = datestr(d.days,'yyyy-mm-dd');   %# format the days column as string
    

    The result:

    d = 
        days          stock_open    stock_close    bond_open    bond_close
        2011-10-01       NaN           NaN         332.57        241.5    
        2011-10-03    941.92        86.815            NaN          NaN    
        2011-10-04    655.91         429.4            NaN          NaN    
        2011-10-05    451.95        257.28         853.06       243.15    
        2011-10-06     839.7        297.56            NaN          NaN    
        2011-10-07    532.62        424.86            NaN          NaN    
        2011-10-09    553.89        119.21            NaN          NaN    
        2011-10-11    680.07        495.07          442.4       154.16    
        2011-10-13    367.19        706.41         904.36       956.42    
        2011-10-14       NaN           NaN         33.179       935.66    
        2011-10-15    239.29        243.57            NaN          NaN    
        2011-10-16    578.92        785.07         532.43       818.71    
        2011-10-17    866.89         74.09          716.5       728.26    
        2011-10-18    406.78        393.88          179.3       175.81    
        2011-10-19    112.62        3.3941         336.53       360.37    
        2011-10-20    443.85        220.68            NaN          NaN    
        2011-10-21       NaN           NaN         187.71       188.79    
        2011-10-22    300.18        1.3006            NaN          NaN    
        2011-10-23    401.39        189.18            NaN          NaN    
        2011-10-24    833.36        142.48         321.93       1.1984    
        2011-10-25       NaN           NaN         403.86       316.42    
        2011-10-26    403.63        268.08            NaN          NaN    
        2011-10-27    390.18        174.89            NaN          NaN    
        2011-10-28       NaN           NaN         548.57       699.62    
        2011-10-29    360.45        138.65         48.739       625.26    
        2011-10-30    140.26        598.89         552.73       543.06    
        2011-10-31    260.13        901.06         274.81       439.04   
    

    EDIT:

    Say you had the following two files containing the data:

    bonds.csv

    10/6/1977 7.72 7.72
    10/7/1977 7.73 7.73
    10/11/1977 7.77 7.77
    10/12/1977 7.79 7.79
    10/13/1977 7.79 7.79
    10/14/1977 7.79 7.79
    10/17/1977 7.79 7.79
    10/18/1977 7.8 7.8
    

    stocks.csv

    10/06/77 95.68 96.05
    10/07/77 96.05 95.97
    10/10/77 95.97 95.75
    10/11/77 95.75 94.93
    10/12/77 94.82 94.04
    10/13/77 94.04 93.46
    10/14/77 93.46 93.56
    10/17/77 93.56 93.47
    

    You can read the data using the TEXTSCAN function:

    %# read bonds data
    fid = fopen('bonds.csv','rt');
    C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
    fclose(fid);
    M1 = [datenum(C{1},'mm/dd/yyyy') C{2}];
    
    %# read stocks data
    fid = fopen('stocks.csv','rt');
    C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
    fclose(fid);
    M2 = [datenum(C{1},'mm/dd/yy') C{2}];
    

    Now you can use the same code above (starting at "get the full range of dates...", or use the DATASET class). After joining them, this gives me:

    C = 
        '1977-10-06'    [7.72]    [7.72]    [95.68]    [96.05]
        '1977-10-07'    [7.73]    [7.73]    [96.05]    [95.97]
        '1977-10-10'    [ NaN]    [ NaN]    [95.97]    [95.75]
        '1977-10-11'    [7.77]    [7.77]    [95.75]    [94.93]
        '1977-10-12'    [7.79]    [7.79]    [94.82]    [94.04]
        '1977-10-13'    [7.79]    [7.79]    [94.04]    [93.46]
        '1977-10-14'    [7.79]    [7.79]    [93.46]    [93.56]
        '1977-10-17'    [7.79]    [7.79]    [93.56]    [93.47]
        '1977-10-18'    [ 7.8]    [ 7.8]    [  NaN]    [  NaN]
    

提交回复
热议问题