Using Countif on Dates in R

帅比萌擦擦* 提交于 2019-12-23 04:57:17

问题


I have the following table

**A**  | **B**  | **C** |**D** |
:----: | :----: | :----:|:----:|
1/1/17 | 3/1/17 |4/1/17 | H    |
1/1/17 | 3/1/17 |4/1/17 | H    |
2/1/17 | 4/1/17 |5/1/17 | V    |
3/1/17 | 5/1/17 |6/1/17 | V    |
4/1/17 | 5/1/17 |7/1/17 | H    |
4/1/17 | 6/1/17 |7/1/17 | H    |

Looking for the result as in the table below using R code

 1. A column with Unique list of dates from columns A,B & C above
 2. A count of dates <= (less than or equal to) the unique 
    dates column value in each of the columns A,B & C from above table. 
 3. Filtered by column D value of 'H' only 

Result

**Unique Dates**  | **Count of A**  | **Count of B** |**Count of C** |
    :----:        |     :----:      |     :----:     |     :----:    |
    1/1/17        |       2         |       0        |       0       |
    2/1/17        |       2         |       0        |       0       |
    3/1/17        |       2         |       2        |       0       |
    4/1/17        |       4         |       2        |       2       |
    5/1/17        |       4         |       3        |       2       |
    6/1/17        |       4         |       4        |       2       |
    7/1/17        |       4         |       0        |       4       |

回答1:


Your data as a reproducible example

library(lubridate)
df <- data.frame(A=dmy(c("1/1/17","1/1/17","2/1/17","3/1/17","4/1/17","4/1/17")),
             B=dmy(c("3/1/17","3/1/17","4/1/17","5/1/17","5/1/17","6/1/17")),
             C=dmy(c("4/1/17","4/1/17","5/1/17","6/1/17","7/1/17","7/1/17")),
             D=c("H","H","V","V","H","H"),stringsAsFactors=F)

tidyverse and zoo solution

library(tidyverse)
library(zoo)
df %>% 
  filter(D=="H") %>%             # uses only rows where column D == H
  gather(Date, value, -D) %>%    # gather Dates into long format, ignore column D
  select(-D) %>%                 # unselect column D
  group_by(Date, value) %>%      # group by Dates
  summarise(Count = length(value)) %>%    # Count occurrence of Date
  arrange(Date) %>%                       # Sort Date
  mutate(Count = cumsum(Count)) %>%       # cumulative sum of Dates (<=)
  spread(Date, Count) %>%                 # spread Count into wide format
  mutate_at(vars(A:C), na.locf, na.rm=F) %>%   # fill NAs forward
  replace(is.na(.), 0)                         # fill remaining NA with 0

Output

       value     A     B     C
1 2017-01-01     2     0     0
2 2017-01-03     2     2     0
3 2017-01-04     4     2     2
4 2017-01-05     4     3     2
5 2017-01-06     4     4     2
6 2017-01-07     4     4     4

Note that 2017-01-02 is missing because it is not a unique date that shows up in your input data




回答2:


At first glance, the question seems to be a simple reshaping task. A closer look shows that the requirements aren't easily implemented if we want to follow the OP's specifications exactly to the spot:

  1. A column with Unique list of dates from columns A,B & C above
  2. A count of dates <= (less than or equal to) the unique dates column value in each of the columns A,B & C from above table.
  3. Filtered by column D value of 'H' only

The data.table solution below reshapes the data from wide to long form, does all aggregations including supplementing missing combinations in the long form by grouping and reshapes to wide format finally. Additional explanations are given in the comments within the code.

library(data.table)   # CRAN version 1.10.4 used
# coerce to data.table
setDT(DT)[
  # reshape from wide to long format, 
  # thereby renaming one column as requested
  , melt(.SD, id.vars = "D", value.name = "Unique_Dates")][
    # convert dates from character to class Date
    , Unique_Dates := lubridate::dmy(Unique_Dates)][
      # count occurences by variable & date, 
      # set key & order by variable & date for subsequent cumsum & join
      , .N, keyby = .(D, variable, Unique_Dates)][
        # compute cumsum for each variable along unique dates
        , N := cumsum(N), by = .(D, variable)][
          # join with all possible combinations of D, variables and dates
          # use rolling join to fill missing values
          CJ(D, variable, Unique_Dates, unique = TRUE), roll = Inf][
            # replace remaining NAs
            is.na(N), N := 0L][
              # finally, reshape selected rows from long to wide
              D == "H", dcast(.SD, Unique_Dates ~ paste0("Count_of_", variable))]
   Unique_Dates Count_of_A Count_of_B Count_of_C
1:   2017-01-01          2          0          0
2:   2017-01-02          2          0          0
3:   2017-01-03          2          2          0
4:   2017-01-04          4          2          2
5:   2017-01-05          4          3          2
6:   2017-01-06          4          4          2
7:   2017-01-07          4          4          4
  • The columns are named according to OP's expected result.
  • The result includes 2017-01-02 as expected although this date appears only in a row with D == "V" which was supposed to be excluded from the final result.
  • A rolling join is used to fill missing values instead of zoo::na.locf().

Data

In his question, the OP has provided sample data in a printed format which was difficult to "scrape":

library(data.table)
DT <- fread(
  "**A**  | **B**  | **C** |**D** |
  1/1/17 | 3/1/17 |4/1/17 | H    |
  1/1/17 | 3/1/17 |4/1/17 | H    |
  2/1/17 | 4/1/17 |5/1/17 | V    |
  3/1/17 | 5/1/17 |6/1/17 | V    |
  4/1/17 | 5/1/17 |7/1/17 | H    |
  4/1/17 | 6/1/17 |7/1/17 | H    |",
  sep ="|", drop = 5L, stringsAsFactors = TRUE)[
    , setnames(.SD, stringr::str_replace_all(names(DT), "\\*", ""))][]
DT
        A      B      C D
1: 1/1/17 3/1/17 4/1/17 H
2: 1/1/17 3/1/17 4/1/17 H
3: 2/1/17 4/1/17 5/1/17 V
4: 3/1/17 5/1/17 6/1/17 V
5: 4/1/17 5/1/17 7/1/17 H
6: 4/1/17 6/1/17 7/1/17 H


来源:https://stackoverflow.com/questions/45556705/using-countif-on-dates-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!