问题
I have a dataset as follows:
library(data.table)
DT <- structure(list(State_Ab = c("MD", "MD", "MD", "MD", "MD", "MD",
"MD", "MD", "MD", "MD", "MD", "MD", "MD", "MD"), County = c("Baltimore",
"Baltimore", "Baltimore", "Baltimore", "Baltimore", "Baltimore",
"Baltimore", "Baltimore", "Baltimore", "Baltimore", "Baltimore",
"Baltimore", "Baltimore", "Baltimore"), year = c(1994, 1994,
1998, 1998, 2000, 2000, 2004, 2004, 2006, 2006, 2010, 2010, 2016,
2016), Population = c(140942, 219673, 235413, 146385, 292019,
170419, 336917, 187402, 145623, 268229, 158692, 281834, 381365,
231836)), row.names = c(NA, -14L), class = c("data.table", "data.frame"
))
State_Ab County year Population
1: MD Baltimore 1994 140942
2: MD Baltimore 1994 219673
3: MD Baltimore 1998 235413
4: MD Baltimore 1998 146385
5: MD Baltimore 2000 292019
6: MD Baltimore 2000 170419
7: MD Baltimore 2004 336917
8: MD Baltimore 2004 187402
9: MD Baltimore 2006 145623
10: MD Baltimore 2006 268229
11: MD Baltimore 2010 158692
12: MD Baltimore 2010 281834
13: MD Baltimore 2016 381365
14: MD Baltimore 2016 231836
Some of these values are for Baltimore City, some for Baltimore City. Based on the information I have, the max value should be Baltimore City, the min value Baltimore County. I thought I would do the following, but it fails somehow.
DT <- setDT(DT)[County=="Baltimore" & Population== max(Population, na.rm=TRUE),County:="Baltimore City", by = c("year","State_Ab","County")]
DT <- setDT(DT)[County=="Baltimore" & Population== min(Population, na.rm=TRUE),County:="Baltimore County", by = c("year","State_Ab","County")]
The result is however not really what I was expecting.
State_Ab County year Population
1: MD Baltimore County 1994 140942
2: MD Baltimore 1994 219673
3: MD Baltimore 1998 235413
4: MD Baltimore 1998 146385
5: MD Baltimore 2000 292019
6: MD Baltimore 2000 170419
7: MD Baltimore 2004 336917
8: MD Baltimore 2004 187402
9: MD Baltimore 2006 145623
10: MD Baltimore 2006 268229
11: MD Baltimore 2010 158692
12: MD Baltimore 2010 281834
13: MD Baltimore City 2016 381365
14: MD Baltimore 2016 231836
What am I missing here?
Desired result
State_Ab County year Population
1: MD Baltimore County 1994 140942
2: MD Baltimore City 1994 219673
3: MD Baltimore City 1998 235413
4: MD Baltimore County 1998 146385
5: MD Baltimore City 2000 292019
6: MD Baltimore County 2000 170419
7: MD Baltimore City 2004 336917
8: MD Baltimore County 2004 187402
9: MD Baltimore County 2006 145623
10: MD Baltimore City 2006 268229
11: MD Baltimore County 2010 158692
12: MD Baltimore City 2010 281834
13: MD Baltimore City 2016 381365
14: MD Baltimore County 2016 231836
回答1:
You can order
the data based on Population
and assign c("Baltimore County", "Baltimore City")
in each group.
library(data.table)
DT[order(Population), County := c("Baltimore County", "Baltimore City"), .(State_Ab, year)]
DT
# State_Ab County year Population
# 1: MD Baltimore County 1994 140942
# 2: MD Baltimore City 1994 219673
# 3: MD Baltimore City 1998 235413
# 4: MD Baltimore County 1998 146385
# 5: MD Baltimore City 2000 292019
# 6: MD Baltimore County 2000 170419
# 7: MD Baltimore City 2004 336917
# 8: MD Baltimore County 2004 187402
# 9: MD Baltimore County 2006 145623
#10: MD Baltimore City 2006 268229
#11: MD Baltimore County 2010 158692
#12: MD Baltimore City 2010 281834
#13: MD Baltimore City 2016 381365
#14: MD Baltimore County 2016 231836
回答2:
Anyone who is interested in a base R
solution:
dat$County <- c("Baltimore County", "Baltimore City")[
unlist( lapply( unique( dat$year ), function(x)
order(dat[which(dat$year == x),"Population"] )) ) ]
# dat
# State_Ab County year Population
# 1: MD Baltimore County 1994 140942
# 2: MD Baltimore City 1994 219673
# 3: MD Baltimore City 1998 235413
# 4: MD Baltimore County 1998 146385
# 5: MD Baltimore City 2000 292019
# 6: MD Baltimore County 2000 170419
# 7: MD Baltimore City 2004 336917
# 8: MD Baltimore County 2004 187402
# 9: MD Baltimore County 2006 145623
#10: MD Baltimore City 2006 268229
#11: MD Baltimore County 2010 158692
#12: MD Baltimore City 2010 281834
#13: MD Baltimore City 2016 381365
#14: MD Baltimore County 2016 231836
来源:https://stackoverflow.com/questions/65338125/renaming-a-column-entry-when-it-is-the-maximum-value-by-group