Ranking rows in R

房东的猫 提交于 2019-12-25 03:35:31

问题


I have a dataset of values that has multiple columns (for different sites) and rows (for different days) that I am trying to rank for each day using R. I would like the rank the data for each column (site) from the total number of sites within one day (so ranking based on each row). It would be possible to do in Excel, but would obviously take a long time. Below is a [much smaller] example of what i'm trying to achieve:

date - site1 - site2 - site3 - site4
1/1/00 - 24 - 33 - 10 - 13
2/1/00 - 13 - 25 - 6 - 2
~~ leading to:
date - site1 - site2 - site3 - site4
1/1/00 - 2 - 1 - 4 - 3
2/1/00 - 2 - 1 - 3 - 4

hopefully there's some simple command, thanks a lot!


回答1:


You can use rank to give the ranks of the data.

# your data
mydf <- read.table(text="date - site1 - site2 - site3 - site4
1/1/00 - 24 - 33 - 10 - 13
2/1/00 - 13 - 25 - 6 - 2", sep="-", header=TRUE)

# find ranks
t(apply(-mydf[-1], 1, rank))

# add to your dates
mydf.rank <- cbind(mydf[1], t(apply(-mydf[-1], 1, rank)))

About the code

mydf[-1] # removes the first column

-mydf[-1] #using the `-` negates the values -so the rank goes in decreasing order

apply with MARGIN=1 finds the ranks across rows

The t transposes the matrix to give the output as you want




回答2:


This is a tidy way.

Reshape to long format, sort (arrange), group, and spread. The only tricky part is knowing that sorting groups means you've automatically ranked them (either ascending or descending). The function row_number acknowledges this.

library(tidyverse)
library(lubridate)

# Data   
df <- tribble(
  ~date,    ~site1,   ~site2,    ~site3,    ~site4,
  mdy("1/1/2000"),   24,       33,        10,          13,
  mdy("2/1/2000"),   13,       25,         6,           2
) 

df %>% 
  gather(site, days, -date) %>%       #< Make Tidy
  arrange(date, desc(days)) %>%       #< Sort relevant columns
  group_by(date) %>% 
  mutate(ranking = row_number()) %>%  #< Ranking function
  select(-days) %>%                   #< Remove unneeded column. Worth keeping in tidy format!
  spread(site, ranking)

#> # A tibble: 2 x 5
#> # Groups:   date [2]
#>   date       site1 site2 site3 site4
#>   <date>     <int> <int> <int> <int>
#> 1 2000-01-01     2     1     4     3
#> 2 2000-02-01     2     1     3     4

Created on 2018-03-06 by the reprex package (v0.2.0).


来源:https://stackoverflow.com/questions/23530731/ranking-rows-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!