Prevent variable name getting mangled by read.csv/read.table?

喜你入骨 提交于 2019-12-20 04:58:14

问题


My data set testdata has 2 variables named PWGTP and AGEP

The data are in a .csv file.

When I do:

> head(testdata)

The variables show up as

    ï..PWGTP AGEP
          23   55
          26   56
          24   45
          22   51
          25   54
          23   35

So, for some reason, R is reading PWGTP as ï..PWGTP. No biggie.

HOWEVER, when I use some function to refer to the variable ï..PWGTP, I get the message:

Error: id variables not found in data: ï..PWGTP

Similarly, when I use some function to refer to the variable PWGTP, I get the message:

Error: id variables not found in data: PWGTP

2 Questions:

  1. Is there anything I should be doing to the source file to prevent mangling of the variable name PWGTP?

  2. It should be trivial to rename ï..PWGTP to something else -- but R is unable to find a variable named as such. Your thoughts on how one should try to repair the variable name?


回答1:


This is a BOM (Byte Order Mark) UTF-8 issue.

To prevent this from happening, 2 options:

  1. Save your file as UTF-8 without BOM / signature -- or --
  2. Use fileEncoding = "UTF-8-BOM" when using read.table or read.csv

Example:

mydata <- read.table(file = "myfile.txt", fileEncoding = "UTF-8-BOM")




回答2:


It is possible that the column names in the file could be 1 PWGTP i.e.with spaces between the number (or something else) and that characters which result in .. while reading in R. One way to prevent this would be to use check.names = FALSE in read.csv/read.table

d1 <- read.csv("yourfile.csv", header=TRUE, stringsAsFactors=FALSE, check.names=FALSE)

However, it is better not to have a name starting with number or have spaces in between.

So, suppose, if the OP read the data with the default options i.e. with check.names = TRUE, we can use sub to change the column names

names(d1) <- sub(".*\\.+", "", names(d1))

As an example

sub(".*\\.+", "", "ï..PWGTP")
#[1] "PWGTP"


来源:https://stackoverflow.com/questions/37802797/prevent-variable-name-getting-mangled-by-read-csv-read-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!