Importing csv file into R - numeric values read as characters

前端未结

关注

 6  853

I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.

This is what I have done so far:

相关标签:

6条回答

情深已故

2020-12-01 01:55
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori.

Try this instead.
```
char_data <- read.csv(input_filename, stringsAsFactors = F)
num_data <- data.frame(data.matrix(char_data))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns])
```
The code does the following:
1. Imports your data as character columns.
2. Creates an instance of your data as numeric columns.
3. Identifies which columns from your data are numeric (assuming columns with less than 50% NAs upon converting your data to numeric are indeed numeric).
4. Merging the numeric and character columns into a final dataset.
This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉梦人生

2020-12-01 01:56

version for data.table based on code from dmanuge :

convNumValues<-function(ds){
  ds<-data.table(ds)
  dsnum<-data.table(data.matrix(ds))
  num_cols <- sapply(dsnum,function(x){mean(as.numeric(is.na(x)))<0.5})
  nds <- data.table(  dsnum[, .SD, .SDcols=attributes(num_cols)$names[which(num_cols)]]
                        ,ds[, .SD, .SDcols=attributes(num_cols)$names[which(!num_cols)]] )
return(nds)
}

0 讨论(0)

野趣味

2020-12-01 01:58

Including this in the read.csv command worked for me: strip.white = TRUE

(I found this solution here.)

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-01 02:09

I had a similar problem. Based on Joshua's premise that excel was the problem I looked at it and found that the numbers were formatted with commas between every third digit. Reformatting without commas fixed the problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-01 02:16

In read.table (and its relatives) it is the na.strings argument which specifies which strings are to be interpreted as missing values NA. The default value is na.strings = "NA"

If missing values in an otherwise numeric variable column are coded as something else than "NA", e.g. "." or "N/A", these rows will be interpreted as character, and then the whole column is converted to character.

Thus, if your missing values are some else than "NA", you need to specify them in na.strings.

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-12-01 02:20
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.

Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE). [If that does not work, please take a look at ?read.table (which read.csv wraps), however there may be some other underlying issue].

For example:
```
   delim = ","  # or is it "\t" ?
   dec = "."    # or is it "," ?
   myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
```
Then, let's say your numeric columns is column 4
```
   myDataFrame[, 4]  <- as.numeric(myDataFrame[, 4])  # you can also refer to the column by "itsName"
```
Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
0 讨论(0)
发布评论:

提交评论
- 加载中...