Remove a range in data.table

前端未结

关注

 2  1862

I am trying to exclude some rows from a datatable based on, let\'s say, days and month - excluding for example summer holidays, that always begin for example 15th of June and en

相关标签:

2条回答

孤独总比滥情好

2021-02-06 06:54

Based on the answer here, you might try something like

# Sample data
DT <- data.table(Month = sample(c(1,3:12), 100, replace = TRUE),
  Day = sample(1:30, 100, replace = TRUE), key = "Month,Day")

# Dates that you want to exclude
excl <- as.data.table(rbind(expand.grid(6, 15:30), expand.grid(7, 1:15)))

DT[-na.omit(DT[excl, which = TRUE])]

If your data contain at least one entry for each day you want to exclude, na.omit might not be required.

0 讨论(0)

爱一瞬间的悲伤

2021-02-06 06:57
Great question. I've edited the question title to match the question.

A simple approach avoiding as.Date which reads nicely :
```
DT[!(Month*100L+Day) %between% c(0615L,0715L)]
```
That's probably fast enough in many cases. If you have a lot of different ranges, then you may want to step up a gear :
```
DT[,mmdd:=Month*100L+Day]
from = DT[J(0615),mult="first",which=TRUE]
to = DT[J(0715),mult="first",which=TRUE]
DT[-(from:to)]
```
That's a bit long and error prone because it's DIY. So one idea is that a list column in an i table would represent a range query (FR#203, like a binary search %between%). Then a not-join (also not yet implemented, FR#1384) could be combined with the list column range query to do exactly what you asked :
```
setkey(DT,mmdd)
DT[-J(list(0615,0715))]
```
That would extend to multiple different ranges, or the same range for many different ids, in the usual way; i.e., more rows added to i.
0 讨论(0)
发布评论:

提交评论
- 加载中...