I want to make a simple table that showcases the largest 10 values for a given variable in my dataset, as well as 4 other variables for each observation, so basically a small su
This should do it...
data <- data[with(data,order(-Score)),]
data <- data[1:10,]
Using sqldf
:
library(sqldf)
sqldf("SELECT * FROM mtcars
ORDER BY mpg DESC
LIMIT 10", row.names = TRUE)
Output:
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
You can get the highest values of a vector using the code below:
my_vec <- c(1:100)
tail(sort(my_vec),10)
So if you want to use this method as a data frame filter you could do:
data(mtcars)
mtcars[mtcars$mpg %in% tail(sort(mtcars$mpg),4),]
which would produce:
> mtcars[mtcars$mpg %in% tail(sort(mtcars$mpg),4),]
mpg cyl disp hp drat wt qsec vs am gear carb
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
You can do this using arrange
from dplyr
. This should also work if there are grouping variables. Just add group_by
before the arrange
. We filter the first 10 observations using slice
.
library(dplyr)
df1 %>%
arrange(desc(Score)) %>%
slice(1:10)
Or another option is ?top_n
(commented by @docendodiscimus) from dplyr
which is a wrapper that uses filter
and min_rank
to select the top n (i.e. 10) entries for 'Score'.
top_n(df1, 10, Score)
Or we use filter
by creating a logical condition with row_number
which is equivalent to rank(ties.method='first')
(contributed by @Steven Beaupre)
filter(df1, row_number(desc(Score)) <= 10)
Or a data.table option (by @David Arenburg). We convert the 'data.frame' to 'data.table' (setDT(df1)
), order
(decreasing) the 'Score' variable, and select the first 10 observations. .SD
means Subset of DataTable
.
library(data.table)
setDT(df1)[order(-Score), .SD[1:10]]