Calculate Percentage for each time series observations per Group in R

前端 未结 2 610
感动是毒
感动是毒 2020-12-07 03:37

New to R, so just my getting head around the data wrangling aspect. Tried looking for a similar question but couldn\'t find it.

I would like to add an additional c

相关标签:
2条回答
  • 2020-12-07 03:38

    If df is your data.frame, you can do:

    library(data.table)
    setDT(df)[,percentage:=signif(100*views/sum(views),4),by=date][]
    #   views       date article percentage
    #1:  1578 2015-01-01       A      56.99
    #2:   616 2015-01-01       B      22.25
    #3:   575 2015-01-01       C      20.77
    #4:  1744 2015-01-02       A      59.22
    #5:   541 2015-01-02       B      18.37
    #6:   660 2015-01-02       C      22.41
    #7:  2906 2015-01-03       A      69.55
    #8:   629 2015-01-03       B      15.06
    #9:   643 2015-01-03       C      15.39
    

    Or base R:

    df$percentage = signif(100*with(df, ave(views, date, FUN=function(x) x/sum(x))),4)
    

    Data:

    df = structure(list(views = c(1578L, 616L, 575L, 1744L, 541L, 660L, 
    2906L, 629L, 643L), date = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
    3L, 3L, 3L), .Label = c("2015-01-01", "2015-01-02", "2015-01-03"
    ), class = "factor"), article = structure(c(1L, 2L, 3L, 1L, 2L, 
    3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    percentage = c(56.99, 22.25, 20.77, 59.22, 18.37, 22.41, 
    69.55, 15.06, 15.39)), .Names = c("views", "date", "article", 
    "percentage"), class = "data.frame", row.names = c(NA, -9L))
    
    0 讨论(0)
  • 2020-12-07 03:52
    library(dplyr)
    df %>% group_by(date) %>% mutate( percentage = views/sum(views))
    Source: local data frame [9 x 4]
    Groups: date
    
      views       date article percentage
    1  1578 2015-01-01       A  0.5698808
    2   616 2015-01-01       B  0.2224630
    3   575 2015-01-01       C  0.2076562
    4  1744 2015-01-02       A  0.5921902
    5   541 2015-01-02       B  0.1837012
    6   660 2015-01-02       C  0.2241087
    7  2906 2015-01-03       A  0.6955481
    8   629 2015-01-03       B  0.1505505
    9   643 2015-01-03       C  0.1539014
    

    Or, if multiple identical articles are possible per day:

    df %>% group_by(date) %>% mutate(sum = sum(views)) %>% 
    group_by(date, article) %>% mutate(percentage = views/sum) %>% 
    select(-sum)
    
    0 讨论(0)
提交回复
热议问题