Create new variable based on existing columns of a cell in Matlab

时光总嘲笑我的痴心妄想 提交于 2019-12-12 20:26:24

问题


I have a cell-array with 600 000 rows and 5 columns. In the following example I only present 3 different codes and a period of 5 years. Input:

c1   c2        c3        c4  c5 

1   2006    20060425    559 'IA'
1   2007    20070129    559 'LO'
1   2007    20070826    559 'VC'
1   2008    20080825     34 'VP'
1   2009    20090116     34 'ZO'
4   2007    20070725     42 'OI'
4   2008    20080712     42 'TF'
4   2008    20080428     42 'XU'
11  2007    20070730    118 'AM'
11  2008    20080912    118 'HK'
11  2009    20090318      2 'VT'
11  2010    20100121      2 'ZZ'

I would like to obtain a new variable that gives for each code (c1) the years in which c1 appears in the sample and the corresponding c4 value. For instance:

Output:

x  2006 2007 2008 2009 2010
1  559  559  34   34   - 
4   -   42   42   -    -
11  -   118  118  2    2 

To get to my cell-array, this is the code I used so far:

a1=T_ANNDAT3;
a2=I{:,7};
a3=I{:,6};
a4=I{:,16};
a5=I{:,1};
TRACK_AN = [num2cell([a2 a1 a4 a3]) a5];
TRACK_AN(cell2mat(TRACK_AN(:,1))==0,:)=[];
[~,indTA,~] = unique(strcat(TRACK_AN(:,1),TRACK_AN(:,2),TRACK_AN(:,4),TRACK_AN(:,5)));
TRACK_AN = TRACK_AN(indTA,:);

Can someone help?


回答1:


You can calculate this very easily using unique as what you have seen. The key is to use the 'rows' flag as the second parameter into unique so you can figure out the unique row entries for the matrix. We only need the first, second and fourth columns of the matrix for this process so we can just subset those columns out. You also need to use the additional output parameters of unique so we can figure out where exactly the unique rows appear in the original cell array. This is the key property we need for the next part of the algorithm.

After you find the unique cell array from the first unique call, we apply unique two more times - One for the column of c1 and one more for the column of c2 so we can index the ID and the year. We will use the third output parameter of unique so that we can assign each unique number within each column into a unique ID. We then use accumarray to create the final matrix that you see above, binning the values in the fourth column given the first column serving as rows and the second column serving as columns for this final matrix. In other words:

%// Create cell array as per your example
C = {1   2006    20060425    559 'IA'
1   2007    20070129    559 'LO'
1   2007    20070826    559 'VC'
1   2008    20080825     34 'VP'
1   2009    20090116     34 'ZO'
4   2007    20070725     42 'OI'
4   2008    20080712     42 'TF'
4   2008    20080428     42 'XU'
11  2007    20070730    118 'AM'
11  2008    20080912    118 'HK'
11  2009    20090318      2 'VT'
11  2010    20100121      2 'ZZ'};

%// Get only those columns that are relevant
%// These are the first, second and fourth columns
Cmat = unique(cell2mat(C(:,[1 2 4])), 'rows');

%// Bin each of the first and second columns
%// Give them a unique ID per unique number    
[~,~,ind] = unique(Cmat(:,1));
[~,~,ind2] = unique(Cmat(:,2));

%// Use accumarray to create your matrix    
%// Edit - Thanks to Amro
%// Any values that are missing replace with NaN
finalMat = accumarray([ind ind2], Cmat(:,3), [], [], NaN);

The output is thus:

finalMat =

559   559    34    34   NaN
NaN    42    42   NaN   NaN
NaN   118   118     2     2

I replaced those values that were missing with NaN to signify the missing values.

Hope this helps!




回答2:


A slight variation of @rayryeng's solution:

% data as cell array
C = {
    1   2006    20060425    559 'IA'
    1   2007    20070129    559 'LO'
    1   2007    20070826    559 'VC'
    1   2008    20080825     34 'VP'
    1   2009    20090116     34 'ZO'
    4   2007    20070725     42 'OI'
    4   2008    20080712     42 'TF'
    4   2008    20080428     42 'XU'
    11  2007    20070730    118 'AM'
    11  2008    20080912    118 'HK'
    11  2009    20090318      2 'VT'
    11  2010    20100121      2 'ZZ'
};

% we are only interested in three columns
CC = cell2mat(C(:,[1 2 4]));

% unique codes/years and their mapping
[codes,~,codesInd] = unique(CC(:,1));
[years,~,yearsInd] = unique(CC(:,2));

% pivot table
out = accumarray([codesInd yearsInd], CC(:,3), [], @max, NaN)

The result as expected:

>> out
out =
   559   559    34    34   NaN
   NaN    42    42   NaN   NaN
   NaN   118   118     2     2

or pretty-printed as a table:

>> t = array2table(out, ...
    'RowNames',cellstr(num2str(codes,'code_%d')), ...
    'VariableNames',cellstr(num2str(years,'year_%d')));

>> t
t = 
               year_2006    year_2007    year_2008    year_2009    year_2010
               _________    _________    _________    _________    _________
    code_1     559          559           34           34          NaN      
    code_4     NaN           42           42          NaN          NaN      
    code_11    NaN          118          118            2            2      


来源:https://stackoverflow.com/questions/24841586/create-new-variable-based-on-existing-columns-of-a-cell-in-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!