问题
Why does the following MSR code not replace the original column "Var1"?
rxDataStep(inData = input_xdf, outFile = input_xdf, overwrite = TRUE,
transforms = list(Var1 = as.numeric(Var1)),
transformVars = c("Var1")
)
回答1:
At the moment, RevoScaleR doesn't support changing the type of a variable in an xdf file (even if you write to a different file). The way to do it is to create a new variable, drop the old, and then rename the new variable to the old name.
I would suggest doing this with a transformFunc (see ?rxTransform
for more information), so that you can create the new variable and drop the old, all in one step:
rxDataStep(inXdf, outXdf, transformFunc=function(varlst) {
varlst$Var1b <- as.numeric(varlst$Var1)
varlst$Var1 <- NULL
varlst
}, transformVars="Var1")
# this is fast as it only modifies the xdf metadata, not the data itself
names(outXdf)[names(outXdf) == "Var1b"] <- "Var1"
回答2:
MSR
doesn't allow you to overwrite a variable in place with a different variable type.
You have two options: Write to a different variable or write to a different file. I have added a bit of code that shows that both solutions work as stated in MRS 9.0.1
. As stated in the comments, there is some point in earlier versions where this might not work. I am not totally sure where that point is, so the code should let you know.
input_xdf <- "test.xdf"
modified_xdf <- "test_out.xdf"
xdf_data <- data.frame(Var1 = as.character(1:10),
Var2 = 2:11,
stringsAsFactors = FALSE)
rxDataStep(inData = xdf_data,
outFile = input_xdf,
rowsPerRead = 5,
overwrite = TRUE)
rxDataStep(inData = input_xdf,
outFile = input_xdf,
overwrite = TRUE,
transforms = list(Var1b = as.numeric(Var1)),
transformVars = c("Var1")
)
rxGetInfo(input_xdf, getVarInfo = TRUE, numRows = 5)
rxDataStep(inData = input_xdf,
outFile = modified_xdf,
transforms = list(Var1 = as.numeric(Var1)),
transformVars = c("Var1")
)
rxGetInfo(modified_xdf, getVarInfo = TRUE, numRows = 5)
来源:https://stackoverflow.com/questions/41165139/replace-existng-column-in-msr