Rownames for data.table in R for model.matrix

一个人想着一个人 提交于 2019-12-07 12:30:14

问题


I have a data.table DT and I want to run model.matrix on it. Each row has a string ID, which is stored in the ID column of DT. When I run model.matrix on DT, my formula excludes the ID column. The problem is, model.matrix drops some rows because of NAs. If I set the rownames of DT to the ID column, before calling model.matrix, then the final model matrix has rownames, and I'm all set. Otherwise, I can't figure out what rows I end up with. I'm setting the rownames with rownames(DT) = DT$ID. However, when I try to add a new column to DT, I get a complaint about

"Invalid .internal.selfref detected . . . At an earlier point, this data.table has been copied by R."

So I'm wondering

  1. Is there a better way to set rownames for a data.table
  2. Is there a better approach to solving this problem.

回答1:


There are a couple of issues here.

Firstly, it is a feature of a data.table that they do not have a rownames, instead they have keys which are far more powerful. See this great vignette.

But, it isn't the end of the world. model.matrix returns sensible rownames when you pass it a data.table

For example

A <- data.table(ID = 1:5, x = c(NA, 1:4), y = c(4:2,NA,3))

mm <- model.matrix( ~ x + y, A)

rownames(mm)

## [1] "2" "3" "5"

So rows 2,3 and 5 are those included in the model.matrix.

Now, you can add this sequence as a column to A. This will be useful if you then set the key to something else (thereby losing the original order)

A[, rowid := seq_len(nrow(A)]

You might consider making it character (like the rownames of mm)) but it won't really matter (as you can just as easily convert rownames(mm) to numeric when you need to reference.

As to the warning that data.table gives, if you read the next sentence

Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr()

rownames are an attribute rownames<- (internally at somepoint using the equivalent to attr<-) will (possibly copy) in the same way.

The line from `row.names<-.data.frame` is

attr(x, "row.names") <- value

That being said, data.tables don't have rownames, so there is no point setting them.



来源:https://stackoverflow.com/questions/13977745/rownames-for-data-table-in-r-for-model-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!