问题
I have a three part question based on a dataframe (df is example rows) of goals scored by soccer players in a season
Player Season Goals
Teddy Sheringham 1992/3 22
Les Ferdinand 1992/3 20
Dean Holdsworth 1992/3 19
Andy Cole 1993/4 34
Alan Shearer 1993/4 31
Chris Sutton 1993/4 25
If I want to obtain the top scorer each year I can use
ddply(df, "Season", summarise, maxGoals = max(Goals),
Player=Player[which.max(Goals)])
Questions:
1) It does not apply in this case but does this suffice if there are joint top scorers
2) I am also interested in the runner up for each season being extracted. I have played around with sorting on Goals descending and index 2 but have not found solution
3) Also how would I obtain a count value for each year based on number of Goals scored e.g Goals>20 should give 1 for 1992/3 and 3 for 1993/4 on the above data
回答1:
If there are multiple best players, that expression will report only one of them (specifically, the first in the dataframe in that year).
For q2:
d = ddply(df, "Season", summarise, SecondPlayer=Player[order(Goals)[length(Goals)-1]])
For q3:
d = ddply(df, "Season", summarise, Count=sum(Goals > 20))
回答2:
1+2) No, it is not sufficient. You might have better luck looking at the unique
values under Goals
, and grabbing the rows corresponding to the appropriate value in that case. Maybe something like,
myFun <- function(x,k){
val <- sort(unique(x$Goals))
Players <- x$Players[x$Goals == val[k]]
data.frame(Players = Players, maxGoals = rep(val[k],length(Players)))
}
ddply(df,.(Season),myFun,k = 1)
where you can specify if you want the players with the most, second most, etc number of goals using the parameter k
. (This is untested, obviously, so some minor modifications may be necessary.)
来源:https://stackoverflow.com/questions/9874471/selecting-specific-rows-etc-using-ddply