I'm working with a data frame similar to the extract below:
df <- data.frame(A=c("Some messy string to be used",222,0),
B=c("Very important ? indicator from 2001", 888, 44),
C=c("001 This variable / makes no sense", 888, 44),
D=c("Geography", 1, 2))
I would like to use values in first row as column names, I'm using the code below:
names(df) <- make.names(df[1,])
Unfortunately, the syntax generates names in the format Xn, as illustrated below:
> names(df)
[1] "X3" "X3" "X1" "X3"
I understand that the utilised strings are to messy for make.names
to be meaningfully converted. How can I force R to use those messy string in a more efficient manner? As a rule of thumb I would like to:
- Keep figures (as they correspond to time)
- Keep at least few first words from the text
- Ensure that the names are unique
- The whole solution have to be fairly generic as there is a lot of rubbish in the first row (usually empty spaces or special characters).
You don’t need to use make.names
at all — you can assign the strings directly. This works perfectly fine in R. You just need to backtick-quote the names when you try to use them as R names (e.g. after the $
operator):
names(df) = unlist(df[1,])
df$`Some messy string to be used`
use stringsAsFactors = F
in data.frame which will create columns as char instead of factors. then make names on it.
df <- data.frame(A=c("Some messy string to be used",222,0),
B=c("Very important ? indicator from 2001", 888, 44),
C=c("001 This variable / makes no sense", 888, 44),
D=c("Geography", 1, 2),stringsAsFactors = F)
names(df) <- make.names(df[1,])
names(df)
来源:https://stackoverflow.com/questions/31535146/using-syntactically-difficult-strings-as-column-names-in-a-data-frame