问题
I have a three part question based on a dataframe (df is example rows) of goals scored by soccer players in a season
Player Season Goals
Teddy Sheringham 1992/3 22
Les Ferdinand 1992/3 20
Dean Holdsworth 1992/3 19
Andy Cole 1993/4 34
Alan Shearer 1993/4 31
Chris Sutton 1993/4 25
If I want to obtain the top scorer each year I can use
ddply(df, "Season", summarise, maxGoals = max(Goals),
Player=Player[which.max(Goals)])
Questions:
1) It does not apply in this case but does this suffice if there are joint top scorers
2) I am also interested in the runner up for each season being extracted. I have played around with sorting on Goals descending and index 2 but have not found solution
3) Also how would I obtain a count value for each year based on number of Goals scored e.g Goals>20 should give 1 for 1992/3 and 3 for 1993/4 on the above data
回答1:
If there are multiple best players, that expression will report only one of them (specifically, the first in the dataframe in that year).
For q2:
d = ddply(df, "Season", summarise, SecondPlayer=Player[order(Goals)[length(Goals)-1]])
For q3:
d = ddply(df, "Season", summarise, Count=sum(Goals > 20))
回答2:
1+2) No, it is not sufficient. You might have better luck looking at the unique values under Goals, and grabbing the rows corresponding to the appropriate value in that case. Maybe something like,
myFun <- function(x,k){
val <- sort(unique(x$Goals))
Players <- x$Players[x$Goals == val[k]]
data.frame(Players = Players, maxGoals = rep(val[k],length(Players)))
}
ddply(df,.(Season),myFun,k = 1)
where you can specify if you want the players with the most, second most, etc number of goals using the parameter k. (This is untested, obviously, so some minor modifications may be necessary.)
来源:https://stackoverflow.com/questions/9874471/selecting-specific-rows-etc-using-ddply