问题
I am trying to work out how I would combine an ifelse statement with the shift function in data.table. My data looks like this:
DF <- structure(list(CHR = c(1, 1, 1, 1, 1,1),
SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"),
BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059),
KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590),
locus = c(1, NA, NA, NA, NA, NA)),
.Names = c("CHR","SNP","BP","KBdist","locus"),
row.names = c(NA, 6L),
class = "data.frame")
> df
CHR SNP BP KBdist locus
1 rs2494631 2399149 NA 1
1 rs4648637 2401364 2215 NA
1 rs2494627 2402499 1135 NA
1 rs11122119 6768856 4366357 NA
1 rs1844583 8383469 1614613 NA
1 rs2292242 8385059 1590 NA
and what I am trying to achieve is: "If CHR is equal to the line above, and KBdist is less than 500,000, make locus equal to the line above, else add one to the value of the line above". Which would yield an output that looks like this:
CHR SNP BP KBdist locus
1 rs2494631 2399149 NA 1
1 rs4648637 2401364 2215 1
1 rs2494627 2402499 1135 1
1 rs11122119 6768856 4366357 2
1 rs1844583 8383469 1614613 3
1 rs2292242 8385059 1590 3
I know that I can use shift to access the values in the row above, for example:
DF<-DF[ , KBdist := BP - shift(BP, 1L, type="lag")]
As that is how I created one of the columns. But I don't see how you could extend it to including the ifelse statement conditions above.
Any help would be greatly appreciated.
Thanks in advance.
回答1:
Here is a solution that solves the task in base R
though - data.table
is not used here.
# logical vector with our condition tested
ind <- (diff(DF$CHR) == 0 & DF$KBdist[-1] < 5e+5)
# populating the 'locus' column --- notice the '<<-'
vapply(2:nrow(DF), function (k) DF$locus[k] <<- DF$locus[k-1] + 1 - ind[k-1], numeric(1))
# [1] 1 1 2 3 3
DF
# CHR SNP BP KBdist locus
# 1 1 rs2494631 2399149 NA 1
# 2 1 rs4648637 2401364 2215 1
# 3 1 rs2494627 2402499 1135 1
# 4 1 rs11122119 6768856 4366357 2
# 5 1 rs1844583 8383469 1614613 3
# 6 1 rs2292242 8385059 1590 3
vapply(...)
returns the locus
column and overwrites it.
Remark
Note that I used <<-
inside the function in order to overwrite the DF$locus[k]
value. If you don't like this aspect, simply swap <<-
for <-
and subsitute vapply(...)
with DF$locus[-1] <- vapply(...)
.
回答2:
Another possibility is using cumsum
:
setDT(DF)[, locus := cumsum(c(1L, (CHR!=shift(CHR,1L) | KBdist>=500e3)[-1L]))]
output:
CHR SNP BP KBdist locus
1: 1 rs2494631 2399149 NA 1
2: 1 rs4648637 2401364 2215 1
3: 1 rs2494627 2402499 1135 1
4: 1 rs11122119 6768856 4366357 2
5: 1 rs1844583 8383469 1614613 3
6: 1 rs2292242 8385059 1590 3
来源:https://stackoverflow.com/questions/54486009/combining-an-ifelse-statement-with-shift-data-table-function-in-r