问题
Desperate for help with this.
Raw Data comes from https://www.hockey-reference.com/play-index/tiny.fcgi?id=mmDlH
Looks Like this: csv file
# A tibble: 6 x 19
match_no Date Tm Opp Outcome Time G PP SH S PIM GA PPGA SHGA
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 6/4/… NYI WSH W REG 3 0 0 24 4 0 0 0
2 2 6/4/… WSH NYI L REG 0 0 0 29 2 3 0 0
3 3 6/4/… STL VAN W SO 3 1 0 36 6 2 2 0
4 4 6/4/… VAN STL L SO 2 2 0 25 6 3 1 0
5 5 6/4/… COL SJS L REG 2 0 0 30 4 5 0 0
6 6 6/4/… SJS COL W REG 5 0 0 30 4 2 0 0
# … with 5 more variables: PPO <dbl>, PPOA <dbl>, SA <dbl>, OppPIM <dbl>, DIFF <dbl>
and I can convert to this
A tibble: 6 x 5
# Groups: Tm [1]
Tm Outcome Time n prob
<chr> <chr> <chr> <int> <dbl>
1 ANA L OT 7 0.09
2 ANA L REG 37 0.45
3 ANA L SO 3 0.04
4 ANA W OT 5 0.06
5 ANA W REG 27 0.33
6 ANA W SO 3 0.04
I used this
team_outcomes_regulation <-
df %>%
+ count(Tm,Outcome, Time) %>%
+ group_by(Tm) %>%
+ mutate(prob = round(prop.table(n), 2))
Then I try to ggplot with
team_outcomes_regulation %>%
ggplot(aes(x = Tm, y = prob, fill = Time))
+ geom_bar(position = "fill",stat = "identity")
+ theme(axis.text.x = element_text(angle = 90))
And this is what I get,but I am desperate to get the graph split with the 6 total (Wins by SO, Reg & OT, Losses by SO, Reg & OT)]3
I now want to try and Compare Wins to Goal Difference using the original df.
# A tibble: 6 x 19
match_no Date Tm Opp Outcome Time G PP SH S PIM GA PPGA SHGA
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 6/4/… NYI WSH W REG 3 0 0 24 4 0 0 0
2 2 6/4/… WSH NYI L REG 0 0 0 29 2 3 0 0
3 3 6/4/… STL VAN W SO 3 1 0 36 6 2 2 0
4 4 6/4/… VAN STL L SO 2 2 0 25 6 3 1 0
5 5 6/4/… COL SJS L REG 2 0 0 30 4 5 0 0
6 6 6/4/… SJS COL W REG 5 0 0 30 4 2 0 0
# … with 5 more variables: PPO <dbl>, PPOA <dbl>, SA <dbl>, OppPIM <dbl>, DIFF <dbl>
So I Now want to Extract: the 31 Teams (Tm), Number of Wins (Outcome) and Goal Difference (sum of DIFF), some further assistance please?
回答1:
You are nearly there, as you've already produced a plot split among those values listed in your "Time" column. If you want to plot all your permutations of both "Time" AND "Outcome" columns, that means you need to combine those values into one column and plot the same thing. There are a few options here, but perhaps the easiest would be as follows:
team_outcomes_regulation$outcome_time <-
paste(team_outcomes_regulation$Outcome, "by", team_outcomes_regulation$Time)
Then your plot becomes:
team_outcomes_regulation %>%
ggplot(aes(x = Tm, y = prob, fill = outcome_time)) +
geom_bar(position = "fill",stat = "identity") +
theme(axis.text.x = element_text(angle = 90))
EDIT: Side Question
So I Now want to Extract: the 31 Teams (Tm), Number of Wins (Outcome) and Goal Difference (sum of DIFF), some further assistance please?
For this, I'm creating a dummy dataset similar to your own that should help you visualize one approach you could take. There's a few ways of doing this though--what I have here is "sort of clunky" IMHO.
# dummy data
df <- data.frame(
Tm <- sample(LETTERS[1:5], 30, replace = TRUE),
Outcome <- sample(c('W','L'), 30, replace = TRUE),
Diff <- sample(1:3, 30, replace=TRUE),
Time <- sample(c('REG', 'SO'), 30, replace=TRUE)
)
This gives you 5 teams ("A" through "E") with random outcomes, goal differences, and I also added an "extra" column to show you that this also removes columns that are not needed. The approach here is to remove the losses and then summarize the remaining data, grouped by team. CAUTION: this means that the sum of Diff is based only on wins and not on losses. If you want to include losses, there's a few other ways of doing this.
df %>%
group_by(Tm, Outcome) %>%
summarize(Wins=n(), Goal.Diff=sum(Diff)) %>%
dplyr::filter(Outcome=='W')
# A tibble: 4 x 4
# Groups: Tm [4]
Tm Outcome Wins Goal.Diff
<fct> <fct> <int> <int>
1 A W 5 10
2 B W 3 7
3 C W 4 9
4 D W 1 2
That's one way to do it - if you have further questions related to that, I would suggest you ask a new question on SO. You can link it to this one if you wish, but it's a separate question, so should be asked separately.
来源:https://stackoverflow.com/questions/61317271/how-to-create-a-bar-with-ggplot-with-probability-with-2-variables-and-3-sub-vari