Combining an ifelse statement with shift data.table function in R

我与影子孤独终老i 提交于 2019-12-11 06:39:13

问题


I am trying to work out how I would combine an ifelse statement with the shift function in data.table. My data looks like this:

DF <- structure(list(CHR = c(1, 1, 1, 1, 1,1), 
SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"), 
BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059), 
KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590), 
locus = c(1, NA, NA, NA, NA, NA)), 
.Names = c("CHR","SNP","BP","KBdist","locus"), 
row.names = c(NA, 6L), 
class = "data.frame")

> df

CHR SNP        BP       KBdist   locus
1   rs2494631  2399149  NA       1
1   rs4648637  2401364  2215     NA
1   rs2494627  2402499  1135     NA
1   rs11122119 6768856  4366357  NA
1   rs1844583  8383469  1614613  NA
1   rs2292242  8385059  1590     NA

and what I am trying to achieve is: "If CHR is equal to the line above, and KBdist is less than 500,000, make locus equal to the line above, else add one to the value of the line above". Which would yield an output that looks like this:

CHR SNP        BP       KBdist   locus
1   rs2494631  2399149  NA       1
1   rs4648637  2401364  2215     1
1   rs2494627  2402499  1135     1
1   rs11122119 6768856  4366357  2
1   rs1844583  8383469  1614613  3
1   rs2292242  8385059  1590     3

I know that I can use shift to access the values in the row above, for example:

DF<-DF[ , KBdist := BP - shift(BP, 1L, type="lag")]

As that is how I created one of the columns. But I don't see how you could extend it to including the ifelse statement conditions above.

Any help would be greatly appreciated.

Thanks in advance.


回答1:


Here is a solution that solves the task in base R though - data.table is not used here.

# logical vector with our condition tested
ind <- (diff(DF$CHR) == 0 & DF$KBdist[-1] < 5e+5)
# populating the 'locus' column   ---   notice the '<<-'
vapply(2:nrow(DF), function (k) DF$locus[k] <<- DF$locus[k-1] + 1 - ind[k-1], numeric(1)) 
# [1] 1 1 2 3 3
DF
#   CHR        SNP      BP  KBdist locus
# 1   1  rs2494631 2399149      NA     1
# 2   1  rs4648637 2401364    2215     1
# 3   1  rs2494627 2402499    1135     1
# 4   1 rs11122119 6768856 4366357     2
# 5   1  rs1844583 8383469 1614613     3
# 6   1  rs2292242 8385059    1590     3

vapply(...) returns the locus column and overwrites it.

Remark

Note that I used <<- inside the function in order to overwrite the DF$locus[k] value. If you don't like this aspect, simply swap <<- for <- and subsitute vapply(...) with DF$locus[-1] <- vapply(...).




回答2:


Another possibility is using cumsum:

setDT(DF)[, locus := cumsum(c(1L, (CHR!=shift(CHR,1L) | KBdist>=500e3)[-1L]))]

output:

   CHR        SNP      BP  KBdist locus
1:   1  rs2494631 2399149      NA     1
2:   1  rs4648637 2401364    2215     1
3:   1  rs2494627 2402499    1135     1
4:   1 rs11122119 6768856 4366357     2
5:   1  rs1844583 8383469 1614613     3
6:   1  rs2292242 8385059    1590     3


来源:https://stackoverflow.com/questions/54486009/combining-an-ifelse-statement-with-shift-data-table-function-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!