问题
I’m trying to rank my hospital name by lowest rate for each state. When multiple hospitals have the same rate, the tie should be broken by using the hospital name and sorting it alphabetically. So far I’ve managed to rank it by rate within the state sorting it by hospital name, but I can’t figure out how to break the ties and rank it without skipping numbers
This is what I’ve got so far by using the following code:
outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),] ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state
The output I currently get is:
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 3
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 5
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 5
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 7
GROVE HILL MEMORIAL HOSPITAL AL 10.4 7
SPRINGHILL MEDICAL CENTER AL 10.4 7
WEDOWEE HOSPITAL AL 10.4 7
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 13
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 15
MOBILE INFIRMARY AL 10.7 15
But what I’m trying to get is
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 4
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 6
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 7
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 8
GROVE HILL MEMORIAL HOSPITAL AL 10.4 9
SPRINGHILL MEDICAL CENTER AL 10.4 10
WEDOWEE HOSPITAL AL 10.4 11
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 14
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 16
MOBILE INFIRMARY AL 10.7 17
Any ideas?
回答1:
Using data.table
this is relatively straight-forward:
library(data.table)
# Read only relevant columns from csv file using data.table::fread
outcome_data <- fread("outcome-of-care-measures.csv",
na.strings="Not Available" ,
select = c("Hospital.Name","State","rate"))
# Drop rows NA values using data.table::na.omit
outcome_data <- na.omit(outcome_data)
## Use data.table::setkey to sort/index by State, then rate, then hospital name
setkey(outcome_data,State,rate,Hospital.Name)
## Add a rank column by state, order within groups will be based key order above
## (the .N operator is the number of rows in each State group)
outcome_data[,rank := seq_len(.N),by = .(State)]
回答2:
We need a sequence number by group after the order
step
library(dplyr)
arr2 %>%
group_by(State) %>%
mutate(rank = row_number())
Or if we are starting from 'arr1'
arr1 %>%
arrange(State, rate, Hospital.Name) %>%
group_by(State) %>%
mutate(rank = row_number())
Or using ave
from base R
with(arr2, ave(seq_along(State), State, FUN = seq_along))
#[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
来源:https://stackoverflow.com/questions/48018499/using-sort-and-rank-in-r-on-multiple-columns