sum() with conditions provides incorrect result in dplyr package

前端 未结 1 508
既然无缘
既然无缘 2021-01-22 11:11

When applying sum() with conditions in summarize() function, it does not provide the correct answer.

Make a data frame x:

x = d         


        
相关标签:
1条回答
  • 2021-01-22 11:28

    Duplicate names are causing you problems. In this code

    x %>% summarize(val = sum(val), val.2 = sum(val[flag == 2]))
    

    You have two val objects. One created from val = sum(val) and other from the data frame x. In your code, you change val from the data frame value to val=sum(val) = 5. Then you do

    `val[flag == 2]`
    

    which gives a vector c(2, NA), since val = 5. Hence, when you add 2 + NA you get NA. The solution, don't use val twice,

    x %>% summarize(val_sum = sum(val), val.2 = sum(val[flag == 2]))
    
    0 讨论(0)
提交回复
热议问题