Prevent variable name getting mangled by read.csv/read.table?

后端 未结 2 781
暗喜
暗喜 2021-01-25 13:33

My data set testdata has 2 variables named PWGTP and AGEP

The data are in a .csv file.

When I do:

<         


        
相关标签:
2条回答
  • 2021-01-25 14:19

    This is a BOM (Byte Order Mark) UTF-8 issue.

    To prevent this from happening, 2 options:

    1. Save your file as UTF-8 without BOM / signature -- or --
    2. Use fileEncoding = "UTF-8-BOM" when using read.table or read.csv

    Example:

    mydata <- read.table(file = "myfile.txt", fileEncoding = "UTF-8-BOM")

    0 讨论(0)
  • 2021-01-25 14:19

    It is possible that the column names in the file could be 1 PWGTP i.e.with spaces between the number (or something else) and that characters which result in .. while reading in R. One way to prevent this would be to use check.names = FALSE in read.csv/read.table

    d1 <- read.csv("yourfile.csv", header=TRUE, stringsAsFactors=FALSE, check.names=FALSE)
    

    However, it is better not to have a name starting with number or have spaces in between.

    So, suppose, if the OP read the data with the default options i.e. with check.names = TRUE, we can use sub to change the column names

    names(d1) <- sub(".*\\.+", "", names(d1))
    

    As an example

    sub(".*\\.+", "", "ï..PWGTP")
    #[1] "PWGTP"
    
    0 讨论(0)
提交回复
热议问题