问题
I’m analyzing river streamflow data with R language and I have two nested lists. First holds data (Flowtest) from different river reaches called numbers such as 910, 950, 1012 and 1087. I have hundreds of daily streamflow measurements (Flow), but as I’m preparing yearly statistics the exact day and month doesn’t matter. Each measurement (Flow) is referenced to a year (Year) in the Flowtest table.
Flowtest <- list("910" = tibble(Year = c(2004, 2004, 2005, 2005, 2007, 2008, 2008), Flow=c(123, 170, 187, 245, 679, 870, 820)),
"950" = tibble(Year = c(2004, 2005, 2005, 2005, 2006, 2008, 2008), Flow=c(570, 450, 780, 650, 230, 470, 340)),
"1012" = tibble(Year = c(2005, 2005, 2005, 2005, 2007, 2008, 2008), Flow=c(160, 170, 670, 780, 350, 840, 850)),
"1087" = tibble(Year = c(2004, 2005, 2005, 2007, 2007, 2008, 2008), Flow=c(120, 780, 820, 580, 870, 870, 840)))
The second nested table called RCHtest serves as a lookup table. I calculated the value of the 0.75% percentile (Q3) on a different streamflow dataset than Flowtest (So I don’t want to use Q3 calculated for Flowtest). So I have a value of the 0.75% percentile threshold (Q3) for each of the analyzed years (Years). Analyzed years and river reaches are the same in Flowtest and RCHtest.
RCHtest <- list("910" = data.frame(Year = c(2004:2008), Q3=c(650, 720, 550, 580, 800)),
"950" = data.frame(Year = c(2004:2008), Q3=c(550, 770, 520, 540, 790)),
"1012" = data.frame(Year = c(2004:2008), Q3=c(600, 780, 500, 570, 800)),
"1087" = data.frame(Year = c(2004:2008), Q3=c(670, 790, 510, 560, 780)))
What I would like to obtain is the quantity of values from Flowtest$Flow which fall above the threshold specified in RCHtest$Q3 per Year, per subbasin as shown in Resulttest below.
Resulttest <- list("910" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 0, 0, 1, 2)),
"950" = data.frame(Year = c(2004:2008), aboveQ3=c(1, 1, 0, 0, 0)),
"1012" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 2, 0, 0, 2)),
"1087" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 1, 0, 2, 2)))
How to approach this? Please help
回答1:
You can use combination of Map
with aggregate
:
Map(function(x, y) aggregate(Flow > Q3~Year, merge(x, y, all = TRUE,
na.action = 'na.pass'), sum, na.rm = TRUE, na.action = 'na.pass'),
Flowtest, RCHtest)
This returns :
#$`910`
# Year Flow > Q3
#1 2004 0
#2 2005 0
#3 2006 0
#4 2007 1
#5 2008 2
#$`950`
# Year Flow > Q3
#1 2004 1
#2 2005 1
#3 2006 0
#4 2007 0
#5 2008 0
#$`1012`
# Year Flow > Q3
#1 2004 0
#2 2005 0
#3 2006 0
#4 2007 0
#5 2008 2
#$`1087`
# Year Flow > Q3
#1 2004 0
#2 2005 1
#3 2006 0
#4 2007 2
#5 2008 2
If you want to do this using tidyverse
functions you can do :
library(dplyr)
library(purrr)
map2(Flowtest, RCHtest, ~full_join(.x, .y) %>%
group_by(Year) %>%
summarise(sum = sum(Flow > Q3, na.rm = TRUE)))
来源:https://stackoverflow.com/questions/64583475/using-a-nested-lookup-table-to-find-values-above-thresholds-in-second-table-and