MATLAB: Equal rows of table OR Equal words of strings

问题

I would like to make different tables from different strings. The strings have different lengths, and thus the tables will have different amount of rows. I would like to combine these tables(at the end), and therefore need the tables I have, to have the same amount of rows. My plan is to use NaNs to do this, but yet without success.

I have my code attempt here, with where I'm struggling at, marked as "Problem location." Code:

 String = ["Random info in middle one, "+ ...
           "Random info still continues. ",
           "Random info in middle two. "+ ...
           "Random info still continues. ExtraWord1 ExtraWord2 ExtraWord3 "];  % String 2 has one word more than string one
    
%%%%%% FOCUS AREA BEGINS %%%%%%%%
    for x=1:length(String)
        % Plan to add NaNs
        documents_Overall = tokenizedDocument(String(x,1));
        tdetails = tokenDetails(documents_Overall);
        StringTable = tdetails(:,{'Token','Type'});
        StringHeight(x) = height(StringTable);
    
    MaxHeight=max(StringHeight);
    StringTable(end+1:MaxHeight,1)=NaN; % Problem location.
    
    %Plan to Convert table back to string
    DataCell = table2cell(StringTable);
    String(x,1) = [DataCell{:}];
end

%%%%%% FOCUS AREA ENDS %%%%%%%%


%Plan to combine tables

    documents_Middle = tokenizedDocument(String);
    tdetails = tokenDetails(documents_Middle);
        
    t = table();d = tokenizedDocument(String);
    variableNames = [];variables = [];
    
    for n=1:length(d)
     variableNames = [variableNames {sprintf('Tokens for sentence %d',n)} {sprintf('Type for sentence %d',n)}];
     variables = [variables {d(n).tokenDetails.Token} {d(n).tokenDetails.Type}];
    end
    
    %Table = cell2table(variables);
    table(variables{:},'VariableNames',variableNames)

This continuation is aimed at equaling the amount of rows to an equal the amount of rows, for any amount of strings, with all the other strings needing to fill up to match the longest string. My plan is to use NaNs to achieve this goal, but yet without success. This is what the result of this example should look like:

All help apreciated. Thank you

回答1:

I've built on top of my answer to your previous question.

The logic below is that we first find the size of the largest column (in this example, 14); then, we find the indexes of the columns that need padding (we know that the columns go in pairs, so we can consider only every other column when doing this); finally, we iterate over the columns that need padding, padding said column with <missing> (NaN equivalent for string) and padding the following one with letters.

s = ["Random info in middle one, "+ ...
           "Random info still continues. ",
           "Random info in middle two. "+ ...
           "Random info still continues. ExtraWord ExtraWord ExtraWord "];

t = table();
d = tokenizedDocument(s);

variableNames = [];
variables = [];
max_column_size = 1;

for n=1:length(d)
 variableNames = [variableNames {sprintf('Tokens for sentence %d',n)} {sprintf('Type for sentence %d',n)}];
 variables = [variables {d(n).tokenDetails.Token} {d(n).tokenDetails.Type}];
 column_size = size(d(n).tokenDetails.Token,1);
 if column_size > max_column_size
    max_column_size = column_size;
 end
end

% Setup anonymous function to determine size of column
f = @(x) size(x,1) < max_column_size;

% Loop over variables to determine which columns need to be padded
indeces_to_pad = find(cell2mat(cellfun(f,variables,'UniformOutput',false)));
indeces_to_pad(2:2:end) = [];

% Loop over the columns to be padded and pad them
for n=1:length(indeces_to_pad)
    index_to_pad = indeces_to_pad(n);
    column_size_diff = max_column_size - length(variables{index_to_pad});
    variables{index_to_pad} = [variables{index_to_pad}; NaN((column_size_diff), 1)];
    variables{index_to_pad+1} = [variables{index_to_pad+1}; categorical(repmat("letters",(column_size_diff), 1))];
end


table(variables{:},'VariableNames',variableNames)

will result in the following table:

ans =

  14×4 table

    Tokens for sentence 1    Type for sentence 1    Tokens for sentence 2    Type for sentence 2
    _____________________    ___________________    _____________________    ___________________

         "Random"                letters                 "Random"                letters        
         "info"                  letters                 "info"                  letters        
         "in"                    letters                 "in"                    letters        
         "middle"                letters                 "middle"                letters        
         "one"                   letters                 "two"                   letters        
         ","                     punctuation             "."                     punctuation    
         "Random"                letters                 "Random"                letters        
         "info"                  letters                 "info"                  letters        
         "still"                 letters                 "still"                 letters        
         "continues"             letters                 "continues"             letters        
         "."                     punctuation             "."                     punctuation    
         <missing>               letters                 "ExtraWord"             letters        
         <missing>               letters                 "ExtraWord"             letters        
         <missing>               letters                 "ExtraWord"             letters

来源：https://stackoverflow.com/questions/64643868/matlab-equal-rows-of-table-or-equal-words-of-strings

标签

matlab

datatable

row

nan