Matlab Dates mismatch between two sets of data. Help!

前端未结

关注

 2  1471

囚心锁ツ 2021-01-14 08:08

Please forgive the simplicity of the question but its my first day;

I am working with two sets of time

2条回答

被撕碎了的回忆 (楼主)

2021-01-14 08:54

Basically you want to perform a full outer merge on two datasets based on the date as key.

Consider the following as example:

%# vector of dates (serial datetime)
days = datenum( num2str((1:31)','2011-10-%02d') );   %'# one month (October 2011)

%# lets build two datasets similar to what you described
idx1 = rand(size(days)) > 0.2;                %# randomly pick dates for 1st
M1 = [days(idx1) rand(sum(idx1),2)*1000];     %# sotcks: days,opening,closing
idx2 = rand(size(days)) > 0.5;                %# randomly pick dates for 2nd
M2 = [days(idx2) rand(sum(idx2),2)*1000];     %# bonds: days,opening,closing

%# get the full range of dates, and convert them to indices starting at 1
[allDays,~,ind] = unique( [M1(:,1);M2(:,1)] );
indM1 = ind(1:size(M1,1));
indM2 = ind(size(M1,1)+1:end);

%# merge the two datasets (days,opening,closing,opening,closing)
M = nan(numel(allDays),size(M1,2)+size(M2,2)-1);
M(:,1) = allDays;                   %# available days from both
M(indM1,2:3) = M1(:,2:3);           %# insert 1st dataset values
M(indM2,4:5) = M2(:,2:3);           %# insert 2nd dataset values

%# final merged dataset formatted
C = [cellstr(datestr(M(:,1),'yyyy-mm-dd')) num2cell(M(:,2:end))]

The result:

C = 
    '2011-10-01'    [     NaN]    [     NaN]    [332.5714]    [241.5017]
    '2011-10-03'    [941.9189]    [ 86.8151]    [     NaN]    [     NaN]
    '2011-10-04'    [655.9138]    [429.3973]    [     NaN]    [     NaN]
    '2011-10-05'    [451.9457]    [257.2828]    [853.0636]    [243.1452]
    '2011-10-06'    [839.6974]    [297.5554]    [     NaN]    [     NaN]
    '2011-10-07'    [532.6235]    [424.8584]    [     NaN]    [     NaN]
    '2011-10-09'    [553.8871]    [119.2073]    [     NaN]    [     NaN]
    '2011-10-11'    [680.0655]    [495.0669]    [442.3979]    [154.1594]
    '2011-10-13'    [367.1899]    [706.4072]    [904.3555]    [956.4164]
    '2011-10-14'    [     NaN]    [     NaN]    [ 33.1794]    [935.6614]
    '2011-10-15'    [239.2906]    [243.5734]    [     NaN]    [     NaN]
    '2011-10-16'    [578.9235]    [785.0701]    [532.4265]    [818.7144]
    '2011-10-17'    [866.8871]    [ 74.0896]    [716.4973]    [728.2618]
    '2011-10-18'    [406.7768]    [393.8834]    [179.3018]    [175.8117]
    '2011-10-19'    [112.6151]    [  3.3941]    [336.5329]    [360.3710]
    '2011-10-20'    [443.8458]    [220.6769]    [     NaN]    [     NaN]
    '2011-10-21'    [     NaN]    [     NaN]    [187.7129]    [188.7900]
    '2011-10-22'    [300.1844]    [  1.3006]    [     NaN]    [     NaN]
    '2011-10-23'    [401.3869]    [189.1797]    [     NaN]    [     NaN]
    '2011-10-24'    [833.3636]    [142.4841]    [321.9272]    [  1.1984]
    '2011-10-25'    [     NaN]    [     NaN]    [403.8567]    [316.4195]
    '2011-10-26'    [403.6287]    [268.0760]    [     NaN]    [     NaN]
    '2011-10-27'    [390.1759]    [174.8921]    [     NaN]    [     NaN]
    '2011-10-28'    [     NaN]    [     NaN]    [548.5663]    [699.6170]
    '2011-10-29'    [360.4489]    [138.6490]    [ 48.7386]    [625.2552]
    '2011-10-30'    [140.2554]    [598.8856]    [552.7321]    [543.0622]
    '2011-10-31'    [260.1302]    [901.0579]    [274.8114]    [439.0372]

The merged result contains opening/closing prices from both datasets. When one of them is not available on a specific date, it is replaced by NaN. Note how there are some unrepresented days in the result, this is because both datasets did not list prices on those days.

Alternatively, you could look into the dataset class from the Statistics Toolbox (which is designed for such cases). Using the same example:

%# build dataset object for the two sets
varNames1 = {'days' 'stock_open' 'stock_close'};
varNames2 = {'days' 'bond_open' 'bond_close'};
d1 = dataset([M1, varNames1]);
d2 = dataset([M2, varNames2]);

%# join on days (full-outer join)
d = join(d1,d2, 'keys','days', 'type','fullouter', 'MergeKeys',true);
d.days = datestr(d.days,'yyyy-mm-dd');   %# format the days column as string

The result:

d = 
    days          stock_open    stock_close    bond_open    bond_close
    2011-10-01       NaN           NaN         332.57        241.5    
    2011-10-03    941.92        86.815            NaN          NaN    
    2011-10-04    655.91         429.4            NaN          NaN    
    2011-10-05    451.95        257.28         853.06       243.15    
    2011-10-06     839.7        297.56            NaN          NaN    
    2011-10-07    532.62        424.86            NaN          NaN    
    2011-10-09    553.89        119.21            NaN          NaN    
    2011-10-11    680.07        495.07          442.4       154.16    
    2011-10-13    367.19        706.41         904.36       956.42    
    2011-10-14       NaN           NaN         33.179       935.66    
    2011-10-15    239.29        243.57            NaN          NaN    
    2011-10-16    578.92        785.07         532.43       818.71    
    2011-10-17    866.89         74.09          716.5       728.26    
    2011-10-18    406.78        393.88          179.3       175.81    
    2011-10-19    112.62        3.3941         336.53       360.37    
    2011-10-20    443.85        220.68            NaN          NaN    
    2011-10-21       NaN           NaN         187.71       188.79    
    2011-10-22    300.18        1.3006            NaN          NaN    
    2011-10-23    401.39        189.18            NaN          NaN    
    2011-10-24    833.36        142.48         321.93       1.1984    
    2011-10-25       NaN           NaN         403.86       316.42    
    2011-10-26    403.63        268.08            NaN          NaN    
    2011-10-27    390.18        174.89            NaN          NaN    
    2011-10-28       NaN           NaN         548.57       699.62    
    2011-10-29    360.45        138.65         48.739       625.26    
    2011-10-30    140.26        598.89         552.73       543.06    
    2011-10-31    260.13        901.06         274.81       439.04

EDIT:

Say you had the following two files containing the data:

bonds.csv

10/6/1977 7.72 7.72
10/7/1977 7.73 7.73
10/11/1977 7.77 7.77
10/12/1977 7.79 7.79
10/13/1977 7.79 7.79
10/14/1977 7.79 7.79
10/17/1977 7.79 7.79
10/18/1977 7.8 7.8

stocks.csv

10/06/77 95.68 96.05
10/07/77 96.05 95.97
10/10/77 95.97 95.75
10/11/77 95.75 94.93
10/12/77 94.82 94.04
10/13/77 94.04 93.46
10/14/77 93.46 93.56
10/17/77 93.56 93.47

You can read the data using the TEXTSCAN function:

%# read bonds data
fid = fopen('bonds.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M1 = [datenum(C{1},'mm/dd/yyyy') C{2}];

%# read stocks data
fid = fopen('stocks.csv','rt');
C = textscan(fid, '%s %f %f', 'Delimiter',' ', 'CollectOutput',true);
fclose(fid);
M2 = [datenum(C{1},'mm/dd/yy') C{2}];

Now you can use the same code above (starting at "get the full range of dates...", or use the DATASET class). After joining them, this gives me:

C = 
    '1977-10-06'    [7.72]    [7.72]    [95.68]    [96.05]
    '1977-10-07'    [7.73]    [7.73]    [96.05]    [95.97]
    '1977-10-10'    [ NaN]    [ NaN]    [95.97]    [95.75]
    '1977-10-11'    [7.77]    [7.77]    [95.75]    [94.93]
    '1977-10-12'    [7.79]    [7.79]    [94.82]    [94.04]
    '1977-10-13'    [7.79]    [7.79]    [94.04]    [93.46]
    '1977-10-14'    [7.79]    [7.79]    [93.46]    [93.56]
    '1977-10-17'    [7.79]    [7.79]    [93.56]    [93.47]
    '1977-10-18'    [ 7.8]    [ 7.8]    [  NaN]    [  NaN]

0 讨论(0)

查看其它2个回答