Renaming a column entry when it is the maximum value by group

妖精的绣舞 提交于 2021-01-07 01:38:14

问题


I have a dataset as follows:

library(data.table)
DT <- structure(list(State_Ab = c("MD", "MD", "MD", "MD", "MD", "MD", 
"MD", "MD", "MD", "MD", "MD", "MD", "MD", "MD"), County = c("Baltimore", 
"Baltimore", "Baltimore", "Baltimore", "Baltimore", "Baltimore", 
"Baltimore", "Baltimore", "Baltimore", "Baltimore", "Baltimore", 
"Baltimore", "Baltimore", "Baltimore"), year = c(1994, 1994, 
1998, 1998, 2000, 2000, 2004, 2004, 2006, 2006, 2010, 2010, 2016, 
2016), Population = c(140942, 219673, 235413, 146385, 292019, 
170419, 336917, 187402, 145623, 268229, 158692, 281834, 381365, 
231836)), row.names = c(NA, -14L), class = c("data.table", "data.frame"
))

    State_Ab    County year Population
 1:       MD Baltimore 1994     140942
 2:       MD Baltimore 1994     219673
 3:       MD Baltimore 1998     235413
 4:       MD Baltimore 1998     146385
 5:       MD Baltimore 2000     292019
 6:       MD Baltimore 2000     170419
 7:       MD Baltimore 2004     336917
 8:       MD Baltimore 2004     187402
 9:       MD Baltimore 2006     145623
10:       MD Baltimore 2006     268229
11:       MD Baltimore 2010     158692
12:       MD Baltimore 2010     281834
13:       MD Baltimore 2016     381365
14:       MD Baltimore 2016     231836

Some of these values are for Baltimore City, some for Baltimore City. Based on the information I have, the max value should be Baltimore City, the min value Baltimore County. I thought I would do the following, but it fails somehow.

DT <- setDT(DT)[County=="Baltimore" & Population== max(Population, na.rm=TRUE),County:="Baltimore City", by = c("year","State_Ab","County")]
DT <- setDT(DT)[County=="Baltimore" & Population== min(Population, na.rm=TRUE),County:="Baltimore County", by = c("year","State_Ab","County")]

The result is however not really what I was expecting.

    State_Ab           County year Population
 1:       MD Baltimore County 1994     140942
 2:       MD        Baltimore 1994     219673
 3:       MD        Baltimore 1998     235413
 4:       MD        Baltimore 1998     146385
 5:       MD        Baltimore 2000     292019
 6:       MD        Baltimore 2000     170419
 7:       MD        Baltimore 2004     336917
 8:       MD        Baltimore 2004     187402
 9:       MD        Baltimore 2006     145623
10:       MD        Baltimore 2006     268229
11:       MD        Baltimore 2010     158692
12:       MD        Baltimore 2010     281834
13:       MD   Baltimore City 2016     381365
14:       MD        Baltimore 2016     231836

What am I missing here?

Desired result

    State_Ab    County year Population
 1:       MD Baltimore County  1994     140942
 2:       MD Baltimore City    1994     219673
 3:       MD Baltimore City    1998     235413
 4:       MD Baltimore County  1998     146385
 5:       MD Baltimore City    2000     292019
 6:       MD Baltimore County  2000     170419
 7:       MD Baltimore City    2004     336917
 8:       MD Baltimore County  2004     187402
 9:       MD Baltimore County  2006     145623
10:       MD Baltimore City    2006     268229
11:       MD Baltimore County  2010     158692
12:       MD Baltimore City    2010     281834
13:       MD Baltimore City    2016     381365
14:       MD Baltimore County  2016     231836

回答1:


You can order the data based on Population and assign c("Baltimore County", "Baltimore City") in each group.

library(data.table)

DT[order(Population), County := c("Baltimore County", "Baltimore City"), .(State_Ab, year)]                    
DT

#    State_Ab           County year Population
# 1:       MD Baltimore County 1994     140942
# 2:       MD   Baltimore City 1994     219673
# 3:       MD   Baltimore City 1998     235413
# 4:       MD Baltimore County 1998     146385
# 5:       MD   Baltimore City 2000     292019
# 6:       MD Baltimore County 2000     170419
# 7:       MD   Baltimore City 2004     336917
# 8:       MD Baltimore County 2004     187402
# 9:       MD Baltimore County 2006     145623
#10:       MD   Baltimore City 2006     268229
#11:       MD Baltimore County 2010     158692
#12:       MD   Baltimore City 2010     281834
#13:       MD   Baltimore City 2016     381365
#14:       MD Baltimore County 2016     231836



回答2:


Anyone who is interested in a base R solution:

dat$County <- c("Baltimore County", "Baltimore City")[ 
  unlist( lapply( unique( dat$year ), function(x) 
  order(dat[which(dat$year == x),"Population"] )) ) ]

# dat
#    State_Ab           County year Population
# 1:       MD Baltimore County 1994     140942
# 2:       MD   Baltimore City 1994     219673
# 3:       MD   Baltimore City 1998     235413
# 4:       MD Baltimore County 1998     146385
# 5:       MD   Baltimore City 2000     292019
# 6:       MD Baltimore County 2000     170419
# 7:       MD   Baltimore City 2004     336917
# 8:       MD Baltimore County 2004     187402
# 9:       MD Baltimore County 2006     145623
#10:       MD   Baltimore City 2006     268229
#11:       MD Baltimore County 2010     158692
#12:       MD   Baltimore City 2010     281834
#13:       MD   Baltimore City 2016     381365
#14:       MD Baltimore County 2016     231836


来源:https://stackoverflow.com/questions/65338125/renaming-a-column-entry-when-it-is-the-maximum-value-by-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!