问题
I want to mutate columns from this...
...into...
When I did the following...
villastats<-villastats%>%
mutate(HG = ifelse(HomeTeam == "Aston Villa", villastats$FTHG, ifelse(HomeTeam != "Aston Villa", 0, 0)))
villastats<-villastats%>%
mutate(AG = ifelse(AwayTeam == "Aston Villa", villastats$FTAG, ifelse(AwayTeam != "Aston Villa", 0, 0)))
villastats<-villastats%>%
mutate(THG=cumsum(villastats$HG))
villastats<-villastats%>%
mutate(TAG=cumsum(villastats$AG))
villastats<-villastats%>%
mutate(Tot=THG+TAG)
...it produced the result shown above that I wanted. I want to do all the mutations at once, so I tried
villastats<-villastats%>%
mutate(HG = ifelse(HomeTeam == "Aston Villa", villastats$FTHG, ifelse(HomeTeam != "Aston Villa", 0, 0)))%>%
mutate(AG = ifelse(AwayTeam == "Aston Villa", villastats$FTAG, ifelse(AwayTeam != "Aston Villa", 0, 0)))%>%
mutate(THG=cumsum(villastats$HG))
mutate(TAG=cumsum(villastats$AG))%>%
mutate(Tot=THG+TAG)
This didn't work. The first two lines work fine but when I add the third line it tells me
Error: Column
THG
must be length 38 (the number of rows) or one, not 0 <
Where am I going wrong? Why is it doing this?
回答1:
When you use
villastats$
inside a pipe that is derived from the objectvillastats$
(as you are doing), thenvillastats$$FTHG
refers to the version of the variable before the first step in your pipeline. For instance,someframe <- data.frame(a = 1:3, b = 11:13) # <---------------------------\ someframe %>% | mutate(a = a + 1) %>% # <-------------------------------------\ | mutate(a = a + 2) %>% # <--- this 'a' is referring to --/ | mutate(a = someframe$a + 3) # <--- this 'someframe$a' is referring to --/
In some simpler magrittr pipes, this is "fine" in that the version of the variable at the beginning is no different than at the time of referencing it. However, if there are fewer rows (
dplyr::filter
), different values (mutate(a = a+2)
) or just reordering (arrange
), thena
can be very different fromsomeframe$a
. In the best case, you get an error because the length of the vector you're referencing is incompatible with the operation you're doing. In the worst case, it gives you no warning or error but all of your calculations are silently wrong.You can place all of your
mutate
operations in one call, as invillastats %>% mutate( HG = ifelse(HomeTeam == "Aston Villa", FTHG, ifelse(HomeTeam != "Aston Villa", 0, 0)), AG = ifelse(AwayTeam == "Aston Villa", FTAG, ifelse(AwayTeam != "Aston Villa", 0, 0)), THG = cumsum(HG), TAG = cumsum(AG), Tot = THG+TAG )
While what you did is not wrong, it is slower and perhaps a little harder to read.
Your
ifelse
s are unnecessarily nested. The first comparisonHomeTeam=="AstonVilla"
and the second comparisonHomeTeam!="AstonVilla"
are perfectly complementary, you can reduce all of those to justvillastats %>% mutate( HG = ifelse(HomeTeam == "Aston Villa", FTHG, 0), AG = ifelse(AwayTeam == "Aston Villa", FTAG, 0), THG = cumsum(HG), TAG = cumsum(AG), Tot = THG + TAG )
Not that you asked, but I urge
dplyr::if_else
in place of baseifelse
. The latter drops some classes (tryifelse(TRUE, Sys.time(), Sys.time())
for an example) and allows the programmer to be sloppy by including different class objects in the "yes" and "no" options.if_else
won't let you doif_else(TRUE, "1", -3.14)
, since they are different. (It'll even complain aboutif_else(TRUE, 0, 0L)
. It's strict.) Use it and be declarative, meaning using0L
instead of0
if you expect that your normal operation will be an integer, etc.
来源:https://stackoverflow.com/questions/64325649/why-wont-pipe-operator-let-me-combine-successive-mutations