Why is a trailing comma in a cell array valid Matlab syntax?

泪湿孤枕 提交于 2019-12-03 11:19:23

问题


I was surprised today to discover that

A = {1,2,3}

and

B = {1,2,3,}

are both valid syntax in MATLAB. I would have expected the second statement to yield an error. As best as I can tell, they produce identical cell arrays (all([A{:}]==[B{:}]) returns true).

Is there a reason the second syntax is allowed? Is this a bug in the parser? Are A and B truly the same?

Intriguingly, the following is not allowed:

C = {1,2,3,,,}

回答1:


These are more guesses, rather than an answer.

One could check the Symbol reference and find that the comma , can be used as

Command or Statement Separator

To enter more than one MATLAB command or statement on the same line, separate each command or statement with a comma:

for k = 1:10, sum(A(k)), end

In the line

B = {1,2,3,}

therefore an statement after 3 is expected, there is just }, which means end of cell array, a valid statement.


The semicolon ; has three official usages:

Array Row Separator

When used within square brackets to create a new array or concatenate existing arrays, the semicolon creates a new row in the array:

A = [5, 8; 3, 4]

Output Suppression

When placed at the end of a command, the semicolon tells MATLAB not to display any output from that command. In this example, MATLAB does not display the resulting 100-by-100 matrix:

A = ones(100, 100);

Command or Statement Separator

Like the comma operator, you can enter more than one MATLAB command on a line by separating each command with a semicolon. MATLAB suppresses output for those commands terminated with a semicolon, and displays the output for commands terminated with a comma.

In this example, assignments to variables A and C are terminated with a semicolon, and thus do not display. Because the assignment to B is comma-terminated, the output of this one command is displayed:

A = 12.5; B = 42.7, C = 1.25;

So in the line

x = {1,2,3,;5,6,7}

it follows the valid statement Array Row Separator after 3,. Afterwards a new statement is expected, which in this case is the double 5. Valid.


Now consider the case

x = {1,2,3,;;;;4,5,6;;;}

As above after 3, follows the statement Array Row Separator, and the statement after that is presumably the null statement - NOP borrowed from some underlying program core written in C, which basically means: do nothing. So after 3,; follows three times "do nothing", before there comes the next statement. Makes no sense, as Matlab is telling you: Extra semicolon is unnecessary. - but is valid.

It also allows you pointless things like:

if true
    ;
end

And this is presumably also the reason why

C = {1,2,3,,,} 

returns an error, because the comma , isn't a null statement, but after the first comma there is a statement expected.


The bottom line: it looks weird, but actually seems logic to me, as Matlab uses a lot of C-Code internally and considering the null statement, everything mentioned is valid syntax.


What about other langages?

Semi-colons used like x = [1,2,3,;;;;4,5,6;;;] in Python are invalid, even in the intended Matlab clone numpy, unless wrapped into this uncommon syntax a = np.matrix('1,2,3;4,5,6').

a = np.matrix('1,2,3,;;;;4,5,6;;;')

would throw an error as well, as ; is interpreted as Array Row Separator in any case, which makes the compiler complain about inconsitent row sizes.

However,

x = [1,2,3,]

is also valid syntax in Python and IronPython, as it is in VBScript and Lua as mentioned in mlepage's answer. What do all these languages have in common? They are all (more or less) scripting languages interpreted during runtime. It's not just Matlab. The excitement of the OP therefore remains without cause.




回答2:


Many languages allow one extra element separator in lists, as already mentioned. But this has nothing to do with run-time parsing. Even C allows it. It has to do with ease of use. This is a feature meant to help the user. For example, in C you can define an enum as follows:

enum E {
   a,
   b,
   c,
};

The comma after c is not required, but it is allowed. It makes it easier to add and remove elements from such a list, and it makes it easier to programatically generate such a list (mlepage's answer is correct!).

So, allowing one additional comma at the end is common across most (if not all) programming languages, it makes sense for MATLAB to support it too. The additional comma at the beginning of the list makes less sense, but I guess they support it because it doesn't hurt either.

Multiple commas in a row make no sense, this would imply there are additional elements that are unspecified.

But what is going on with the multiple semicolons then? As Luis Mendo mentioned, [1;;2] is legal syntax. This is something that does deviate significantly from what other languages do.

However, it is consistent with MATLAB's use of the line break. In MATLAB, line breaks are significant. When defining an array using [], line breaks indicate new rows of data:

M = [ 1, 2, 3,
      4, 5, 6,
      7, 8, 9,
    ];

is the same as

M = [1,2,3; 4,5,6; 7,8,9];

(Note how allowing the commas at the end of each row can be convenient at times.)

(Also note that I use [] here to concatenate, the exact same logic applies to {}.)

But because MATLAB wants to allow as much as possible, as long as it remains unambiguous, the above is the same as:

M = [ 1, 2, 3,
      4, 5, 6,


      7, 8, 9,
    ];

If it doesn't hurt to allow empty lines, why not allow them?

Since each newline corresponds to a semicolon, the above is identical to:

M = [ 1, 2, 3,;...
      4, 5, 6,;...
              ;...
              ;...
      7, 8, 9,;...
    ];

which is identical to

M = [ 1, 2, 3,; 4, 5, 6,; ; ; 7, 8, 9,; ];

and so MATLAB must be able to parse that, whether it makes sense or not.


A rebuttal of thewaywewalk's answer:

The argument is that both , and ; are defined as statement separators, but somehow it is assumed that ;;; is a valid statement whereas ,,, is not. This is simply not true:

disp(0),,,disp(1)
disp(0);;;disp(1)

are both valid MATLAB syntax (R2017a parses both without error).

Furthermore, the answer confuses expressions and statements. disp(0) is a statement. The 0 in that statement is an expression. In M=[1,2,3], the things separated by commas are expressions, not statements. The commas there do not work as statement separators.

In fact, in MATLAB the comma and the semicolon have multiple meanings, depending on context. The comma and semicolon at the end of a statement (including a null statement) is different from the comma and semicolon within a concatenating expression ([1,2;3,4]). And the comma can also separate expressions within the brackets of a function call, where the semicolon is not allowed.

Just to make this point clear:

1,,,4

is valid, whereas

[1,,,4]

is not. The commas have different functions in these two statements.

In short, the logic used in that answer is simply incorrect.




回答3:


It's convenient to allow trailing punctuation in languages if ever the code is going to be generated from other code.

For example, Lua allows trailing commas, so it's easy to generate Lua code.

You don't have to have a special case in the generating code to omit the final comma, you can just print ITEM-THEN-COMMA for each and every item.



来源:https://stackoverflow.com/questions/32849720/why-is-a-trailing-comma-in-a-cell-array-valid-matlab-syntax

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!