问题
I have the following table
**A** | **B** | **C** |**D** |
:----: | :----: | :----:|:----:|
1/1/17 | 3/1/17 |4/1/17 | H |
1/1/17 | 3/1/17 |4/1/17 | H |
2/1/17 | 4/1/17 |5/1/17 | V |
3/1/17 | 5/1/17 |6/1/17 | V |
4/1/17 | 5/1/17 |7/1/17 | H |
4/1/17 | 6/1/17 |7/1/17 | H |
Looking for the result as in the table below using R code
1. A column with Unique list of dates from columns A,B & C above
2. A count of dates <= (less than or equal to) the unique
dates column value in each of the columns A,B & C from above table.
3. Filtered by column D value of 'H' only
Result
**Unique Dates** | **Count of A** | **Count of B** |**Count of C** |
:----: | :----: | :----: | :----: |
1/1/17 | 2 | 0 | 0 |
2/1/17 | 2 | 0 | 0 |
3/1/17 | 2 | 2 | 0 |
4/1/17 | 4 | 2 | 2 |
5/1/17 | 4 | 3 | 2 |
6/1/17 | 4 | 4 | 2 |
7/1/17 | 4 | 0 | 4 |
回答1:
Your data as a reproducible example
library(lubridate)
df <- data.frame(A=dmy(c("1/1/17","1/1/17","2/1/17","3/1/17","4/1/17","4/1/17")),
B=dmy(c("3/1/17","3/1/17","4/1/17","5/1/17","5/1/17","6/1/17")),
C=dmy(c("4/1/17","4/1/17","5/1/17","6/1/17","7/1/17","7/1/17")),
D=c("H","H","V","V","H","H"),stringsAsFactors=F)
tidyverse and zoo solution
library(tidyverse)
library(zoo)
df %>%
filter(D=="H") %>% # uses only rows where column D == H
gather(Date, value, -D) %>% # gather Dates into long format, ignore column D
select(-D) %>% # unselect column D
group_by(Date, value) %>% # group by Dates
summarise(Count = length(value)) %>% # Count occurrence of Date
arrange(Date) %>% # Sort Date
mutate(Count = cumsum(Count)) %>% # cumulative sum of Dates (<=)
spread(Date, Count) %>% # spread Count into wide format
mutate_at(vars(A:C), na.locf, na.rm=F) %>% # fill NAs forward
replace(is.na(.), 0) # fill remaining NA with 0
Output
value A B C
1 2017-01-01 2 0 0
2 2017-01-03 2 2 0
3 2017-01-04 4 2 2
4 2017-01-05 4 3 2
5 2017-01-06 4 4 2
6 2017-01-07 4 4 4
Note that 2017-01-02
is missing because it is not a unique date that shows up in your input data
回答2:
At first glance, the question seems to be a simple reshaping task. A closer look shows that the requirements aren't easily implemented if we want to follow the OP's specifications exactly to the spot:
- A column with Unique list of dates from columns A,B & C above
- A count of dates <= (less than or equal to) the unique dates column value in each of the columns A,B & C from above table.
- Filtered by column D value of 'H' only
The data.table
solution below reshapes the data from wide to long form, does all aggregations including supplementing missing combinations in the long form by grouping and reshapes to wide format finally. Additional explanations are given in the comments within the code.
library(data.table) # CRAN version 1.10.4 used
# coerce to data.table
setDT(DT)[
# reshape from wide to long format,
# thereby renaming one column as requested
, melt(.SD, id.vars = "D", value.name = "Unique_Dates")][
# convert dates from character to class Date
, Unique_Dates := lubridate::dmy(Unique_Dates)][
# count occurences by variable & date,
# set key & order by variable & date for subsequent cumsum & join
, .N, keyby = .(D, variable, Unique_Dates)][
# compute cumsum for each variable along unique dates
, N := cumsum(N), by = .(D, variable)][
# join with all possible combinations of D, variables and dates
# use rolling join to fill missing values
CJ(D, variable, Unique_Dates, unique = TRUE), roll = Inf][
# replace remaining NAs
is.na(N), N := 0L][
# finally, reshape selected rows from long to wide
D == "H", dcast(.SD, Unique_Dates ~ paste0("Count_of_", variable))]
Unique_Dates Count_of_A Count_of_B Count_of_C 1: 2017-01-01 2 0 0 2: 2017-01-02 2 0 0 3: 2017-01-03 2 2 0 4: 2017-01-04 4 2 2 5: 2017-01-05 4 3 2 6: 2017-01-06 4 4 2 7: 2017-01-07 4 4 4
- The columns are named according to OP's expected result.
- The result includes
2017-01-02
as expected although this date appears only in a row withD == "V"
which was supposed to be excluded from the final result. - A rolling join is used to fill missing values instead of
zoo::na.locf()
.
Data
In his question, the OP has provided sample data in a printed format which was difficult to "scrape":
library(data.table)
DT <- fread(
"**A** | **B** | **C** |**D** |
1/1/17 | 3/1/17 |4/1/17 | H |
1/1/17 | 3/1/17 |4/1/17 | H |
2/1/17 | 4/1/17 |5/1/17 | V |
3/1/17 | 5/1/17 |6/1/17 | V |
4/1/17 | 5/1/17 |7/1/17 | H |
4/1/17 | 6/1/17 |7/1/17 | H |",
sep ="|", drop = 5L, stringsAsFactors = TRUE)[
, setnames(.SD, stringr::str_replace_all(names(DT), "\\*", ""))][]
DT
A B C D 1: 1/1/17 3/1/17 4/1/17 H 2: 1/1/17 3/1/17 4/1/17 H 3: 2/1/17 4/1/17 5/1/17 V 4: 3/1/17 5/1/17 6/1/17 V 5: 4/1/17 5/1/17 7/1/17 H 6: 4/1/17 6/1/17 7/1/17 H
来源:https://stackoverflow.com/questions/45556705/using-countif-on-dates-in-r