Generalization of mat2str to cell arrays

I sometimes miss a function to produce a string representation of a (possibly nested) cell array. It would be a generalization of mat2str, which only works for non-cell arrays (of numeric, char or logical type).

Given an array x, how to obtain a string representation y, such that evaluating this string produces x?

For example, the input

x = {[10 20], {'abc'; false; true;}};

should produce an output string like

y = '{[10 20], {''abc''; false; true}}';

(or some variation regarding spacing an separators), such that

isequal(x, eval(y))

is true.

This process, transforming a data structure into a string that later can be evaled, is named serialization.

There is a serialize function for Octave that can be used for this purpose which supports any core data type (not only cell arrays) with any number of dimensions (not only 2d).

Examples:

## Works for normal 2d numeric arrays
octave> a = magic (4);
octave> serialize (a)
ans = double([16 2 3 13;5 11 10 8;9 7 6 12;4 14 15 1])
octave> assert (eval (serialize (a)), a)

## Works for normal 3d numeric arrays with all precision
octave> a = rand (3, 3, 3);
octave> serialize (a)
ans = cat(3,double([0.53837757395682650507 0.41720691649633284692 0.66860079620859769189;0.018390655109800025518 0.56538265981533797344 0.20709955358395887304;0.86811365238275806089 0.18398187533949311723 0.20280927116918162634]),double([0.40869259684132724919 0.96877003954154328191 0.32138458265911834522;0.37357584261201565168 0.69925333907961184643 0.10937000120952171389;0.3804633375950405294 0.32942660641033155722 0.79302478034566603604]),double([0.44879474273802461015 0.78659287316710135851 0.49078191654039543534;0.66470978375890155121 0.87740365914996953922 0.77817214018098579409;0.51361398808500036139 0.75508941052835898411 0.70283088935085502591]))
octave> assert (eval (serialize (a)), a)

## Works for 3 dimensional cell arrays of strings
octave> a = reshape ({'foo', 'bar' 'qux', 'lol', 'baz', 'hello', 'there', 'octave'}, [2 2 2])
a = {2x2x2 Cell Array}
octave> serialize (a)
ans = cat(3,{["foo"],["qux"];["bar"],["lol"]},{["baz"],["there"];["hello"],["octave"]})
octave> assert (eval (serialize (a)), a)

However, a better question is why do you want to do this in the first place? If the reason you're doing this is to send variables between multiple instances of Octave, consider using the parallel and mpi packages which have functions specialized designed for this purpose.

The following function works for arbitrary arrays, with any nesting structure and for any shape of the arrays, as long as they are all 2D arrays. Multidimensional arrays are not supported (same as mat2str).

The function also allows specifying arbitrary row and column separators for cell arrays (for example, to choose between comma and space), and optionally forcing those separators for non-cell arrays too (thus overriding mat2str'behaviour). Default separators in cell arrays are ' ' for columns and '; ' for rows.

function y = array2str(x, col_sep, row_sep, sep_noncell)
% Converts a (possibly cell, nested) array to string representation
%
% Optional inputs col_sep and row_sep specify separators for the cell arrays.
% They can be arbitrary strings (but they should be chosen as per Matlab rules
% so that the output string evaluates to the input). Optional flag sep_noncell
% can be used to force those separators with non-cell arrays too, instead of
% the separators produced by mat2str (space and semicolon)

% Default values
if nargin<4
    sep_noncell = false;
end
if nargin<3
    row_sep = '; ';
end
if nargin<2
    col_sep = ' ';
end

x = {x}; % this is to initiallize processing
y = {[]}; % [] indicates content unknown yet: we need to go on
done = false;
while ~done
    done = true; % tentatively
    for n = 1:numel(y);
        if isempty(y{n}) % we need to go deeper
            done = false;
            if ~iscell(x{1}) % we've reached ground
                s = mat2str(x{1}); % final content
                if sep_noncell % replace mat2str's separators if required
                    s = regexprep(s,'(?<=^[^'']*(''[^'']*'')*[^'']*) ', col_sep);
                    s = regexprep(s,'(?<=^[^'']*(''[^'']*'')*[^'']*);', row_sep);
                end
                y{n} = s; % put final content...
                x(1) = []; % ...and remove from x
            else % advance one level
                str = ['{' repmat([{[]}, col_sep], 1, numel(x{1})) '}'];
                ind_sep = find(cellfun(@(t) isequal(t, col_sep), str));
                if ~isempty(ind_sep)
                    str(ind_sep(end)) = []; % remove last column separator
                    ind_sep(end) = [];
                end
                step_sep = size(x{1}, 2);
                str(ind_sep(step_sep:step_sep:end)) = {row_sep};
                y = [y(1:n-1) str y(n+1:end)]; % mark for further processing...
                x = [reshape(x{1}.', 1, []) x(2:end)]; % ...and unbox x{1},
                    % transposed and linearized
            end
        end
    end
end
y = [y{:}]; % concatenate all strings

The above function uses regular expressions to force the specified separators in non-cell arrays. This works in Matlab but not in Octave, due to limitations in supported lookbehind patterns. The following modified version avoids regular expressions, and thus works in Matlab and in Octave. Only the part between if sep_noncell and the matching end changes with respect to the first version.

function y = array2str(x, col_sep, row_sep, sep_noncell)
% Converts a (possibly cell, nested) array to string representation.
% Octave-friendly version
%
% Optional inputs col_sep and row_sep specify separators for the cell arrays.
% They can be arbitrary strings (but they should be chosen as per Matlab rules
% so that the output string evaluates to the input). Optional flag sep_noncell
% can be used to force those separators with non-cell arrays too, instead of
% the separators produced by mat2str (space and semicolon)

% Default values
if nargin<4
    sep_noncell = false;
end
if nargin<3
    row_sep = '; ';
end
if nargin<2
    col_sep = ' ';
end

x = {x}; % this is to initiallize processing
y = {[]}; % [] indicates content unknown yet: we need to go on
done = false;
while ~done
    done = true; % tentatively
    for n = 1:numel(y);
        if isempty(y{n}) % we need to go deeper
            done = false;
            if ~iscell(x{1}) % we've reached ground
                s = mat2str(x{1}); % final content
                if sep_noncell % replace mat2str's separators if required
                    for k = flip(find(~mod(cumsum(s==''''),2) & s==' ')) % process
                        % backwards, because indices to the right will become invalid
                        s = [s(1:k-1) col_sep s(k+1:end)];
                    end
                    for k = flip(find(~mod(cumsum(s==''''),2) & s==';'))
                        s = [s(1:k-1) row_sep s(k+1:end)];
                    end
                end
                y{n} = s; % put final content...
                x(1) = []; % ...and remove from x
            else % advance one level
                str = ['{' repmat([{[]}, col_sep], 1, numel(x{1})) '}'];
                ind_sep = find(cellfun(@(t) isequal(t, col_sep), str));
                if ~isempty(ind_sep)
                    str(ind_sep(end)) = []; % remove last column separator
                    ind_sep(end) = [];
                end
                step_sep = size(x{1}, 2);
                str(ind_sep(step_sep:step_sep:end)) = {row_sep};
                y = [y(1:n-1) str y(n+1:end)]; % mark for further processing...
                x = [reshape(x{1}.', 1, []) x(2:end)]; % ...and unbox x{1},
                    % transposed and linearized
            end
        end
    end
end
y = [y{:}]; % concatenate all strings

How it works

I chose a non-recursive approach because I'm usually more confortable with iteration than with recursion.

The output is gradually built by keeping substrings or empty arrays ([]) in a cell array (y). An empty array in a cell of y indicates "further processing is needed". Substrings define the "structure", or eventually the numeric, character or logical contents in the deepest level of the cell nesting.

In each iteration, the first empty array found in y is substituted by actual content, or by substrings and other empty arrays to be processed later. When y doesn't contain any empty array the process ends, and all substrings of y are concatenated to obtain the final string output.

For example, given input x = {[10 20], {'abc'; false; true;}}; and calling y = array2str(x) the array y in each step is a cell array containing:

'{'   []   ', '   []   '}'

'{'   '[10 20]'   ', '   []   '}'

'{'   '[10 20]'   ', '   '{'   []   '; '   []   '; '   []   '}'   '}'

'{'   '[10 20]'   ', '   '{'   ''abc''   '; '   []   '; '   []   '}'   '}'

'{'   '[10 20]'   ', '   '{'   ''abc''   '; '   'false'   '; '   []   '}'   '}'

'{'   '[10 20]'   ', '   '{'   ''abc''   '; '   'false'   '; '   'true'   '}'   '}'

and the latter is finally concatenated into the string