When I use the read.csv()
function in R
to load data, I often find that an X has been added to variable names. I think I just about always see it
As Gabor said, by default read.csv
deafults to converting the names in your header row to be valid variable names (use check.names = FALSE
to turn this off). This is done using the function make.names
. The help page for that function explains what constitutes a valid variable name.
A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.
The list of reserved words is found on the help page ?reserved
.
The other condition is that the variable name must be 10000 characters or less, but make.names
won't shorten it. So be careful of being really verbose with your variable names.
You can check for valid variable names using
library(assertive.code)
is_valid_variable_name(x)
It is surprising behavior, but I think we would need a reproducible example. Perhaps you have some invisible/special characters hiding in your file?
names(read.csv(textConnection(
"abcdefghijkl, a1,2x")))
behaves fine. Can you make an example along these lines that demonstrates your problem?
As described in the other answer, check.names=FALSE
is a possible workaround. You can experiment with make.names
to determine the behavior ...
read.table
and read.csv
have a check.names=
argument that you can set to FALSE
.
For example, try it with this input consisting of just a header:
> read.csv(text = "a,1,b")
[1] a X1 b
<0 rows> (or 0-length row.names)
versus
> read.csv(text = "a,1,b", check.names = FALSE)
[1] a 1 b
<0 rows> (or 0-length row.names)