Right way to convert data.frame to a numeric matrix, when df also contains strings?

蓝咒 提交于 2019-11-29 19:10:44

Edit 2: See @flodel's answer. Much better.

Try:

# assuming SFI is your data.frame
as.matrix(sapply(SFI, as.numeric))  

Edit: or as @ CarlWitthoft suggested in the comments:

matrix(as.numeric(unlist(SFI)),nrow=nrow(SFI))
data.matrix(SFI)

From ?data.matrix:

Description:

 Return the matrix obtained by converting all the variables in a
 data frame to numeric mode and then binding them together as the
 columns of a matrix.  Factors and ordered factors are replaced by
 their internal codes.

Here is an alternative way if the data frame just contains numbers.

apply(as.matrix.noquote(SFI),2,as.numeric)

but the most reliable way of converting a data frame to a matrix is using data.matrix() function.

I had the same problem and I solved it like this, by taking the original data frame without row names and adding them later

SFIo <- as.matrix(apply(SFI[,-1],2,as.numeric))
row.names(SFIo) <- SFI[,1]
Matthew MacLennan

Another way of doing it is by using the read.table() argument colClasses to specify the column type by making colClasses=c(*column class types*). If there are 6 columns whose members you want as numeric, you need to repeat the character string "numeric" six times separated by commas, importing the data frame, and as.matrix() the data frame. P.S. looks like you have headers, so I put header=T.

as.matrix(read.table(SFI.matrix,header=T,
colClasses=c("numeric","numeric","numeric","numeric","numeric","numeric"),
sep=","))
user3315638

I manually filled NAs by exporting the CSV then editing it and reimporting, as below.

Perhaps one of you experts might explain why this procedure worked so well (the first file had columns with data of types char, INT and num (floating point numbers)), which all became char type after STEP 1; but at the end of STEP 3 R correctly recognized the datatype of each column).

# STEP 1:
MainOptionFile <- read.csv("XLUopt_XLUstk_v3.csv",
                            header=T, stringsAsFactors=FALSE)
#... STEP 2:
TestFrame <- subset(MainOptionFile, str_locate(option_symbol,"120616P00034000") > 0)
write.csv(TestFrame, file = "TestFrame2.csv")
# ...
# STEP 3:
# I made various amendments to `TestFrame2.csv`, including replacing all missing data cells with appropriate numbers. I then read that amended data frame back into R as follows:    
XLU_34P_16Jun12 <- read.csv("TestFrame2_v2.csv",
                            header=T,stringsAsFactors=FALSE)

On arrival back in R, all columns had their correct measurement levels automatically recognized by R!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!