Can anyone explain this behaviour, when calling R externally from command line with a multi-line string argument?
$ Rscript -e "dim(mtcars)" [1] 32 11 $ Rscript -e "df = mtcars; dim(df)" [1] 32 11 $ Rscript -e "head(rownames(mtcars))" [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant" $ Rscript -e "df = mtcars; df$car = rownames(mtcars); dim(df)" NULL
The culprit here is the shell, not R.
You haven't indicated what shell you're using, but they all (sh/bash/ash/dash/ksh/zsh/tcsh) behave pretty similarly for our purposes, so let's assume you're using bash, so I can quote half the bash man page:
...
DEFINITIONS
The following definitions are used throughout the rest of this document.
...
name A word consisting only of alphanumeric characters and underscores, and beginning with an alphabetic character or an underscore. Also referred to as an identifier.
...
QUOTING
...
Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !. The characters $ and ` retain their special meaning within double quotes.
...
PARAMETERS
A parameter is an entity that stores values. It can be a name, a number, or one of the special characters listed below under Special Parameters. A variable is a parameter denoted by a name.
...
Parameter Expansion
The `$' character introduces parameter expansion, command substitution, or arithmetic expansion. The parameter name or symbol to be expanded may be enclosed in braces, which are optional but serve to protect the variable to be expanded from characters immediately following it which could be interpreted as part of the name.
...
${parameter}
The value of parameter is substituted. The braces are required when parameter is a positional parameter with more than one digit, or when parameter is followed by a character which is not to be interpreted as part of its name. The parameter is a shell parameter as described above (PARAMETERS) or an array reference (Arrays).
...
So, parameter expansion takes effect within double-quoted strings, and you have an unescaped dollar followed by a valid variable name in your double-quoted string: $car
.
I infer that the variable $car
was unset in the shell session in which you ran the offending command.
The unfortunate default behavior of the shell is to silently expand unset variables to the empty string. Thus your R code ends up being mangled into this:
df = mtcars; df = rownames(mtcars); dim(df)
Thus df
gets overwritten with a dimensionless character vector, and dim(df)
returns NULL on such vectors.
The problem can be solved by backslash-escaping the dollar:
Rscript -e "df <- mtcars; df\$car <- rownames(mtcars); dim(df);"; ## [1] 32 12
Or better yet, using single-quotes, which do not permit any interpolation to take place inside the single-quoted string:
Rscript -e 'df <- mtcars; df$car <- rownames(mtcars); dim(df);'; ## [1] 32 12
On a personal note, IMO, very few engineers seem to understand the importance of strictness in software design. It would have been a great service to the computer programming profession had the shell been designed to reject unset variables from day one, but alas, twas not the case. And so poor souls like you and me and basically everyone who ever does any shell programming run afoul of these silent errors from time to time.
Fortunately, an optional feature was added to the shell early on called nounset
:
help set| grep -e nounset -e -u; ## nounset same as -u ## -u Treat unset variables as an error when substituting.
If you turn it on, which I strongly suggest you do, then the offending Rscript
command would have failed immediately and clearly:
set -u; Rscript -e "df <- mtcars; df$car <- rownames(mtcars); dim(df);"; ## -bash: car: unbound variable