Plotting binned data using sum instead of count

后端 未结 2 1423
春和景丽
春和景丽 2021-01-07 09:28

I\'ve tried to search for an answer, but can\'t seem to find the right one that does the job for me.

I have a dataset (data) with two variables: people\

2条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-07 10:18

    For completeness, I am adding the base R solution to @bouncyball's great answer. I will use their synthetic data, but I will use cut to create the age groups before aggregation.

    # Creates data for plotting
    > set.seed(123)
    > dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
                        awards = rpois(200, 3))
    
    # Created a new column containing the age groups
    > dat[["ageGroups"]] <- cut(dat[["age"]], c(-Inf, 20, 30, 40, Inf),
                                right = FALSE)
    

    cut will divide up a set of numeric data based on breaks defined in the second argument. right = FALSE flips the breaks so values the groups would include the lower values rather than the upper ones (ie 20 <= x < 30 rather than the default of 20 < x <= 30). The groups do not have to be equally spaced. If you do not want to include data above or below a certain value, simply remove the Inf from the end or -Inf from the beginning respectively, and the function will return instead. If you would like to give your groups names, you can do so with the labels argument.

    Now we can aggregate based on the groups we created.

    > (summedGroups <- aggregate(awards ~ ageGroups, dat, FUN = sum))
      ageGroups awards
    1   [20,30)    188
    2   [30,40)    212
    3 [40, Inf)    194
    

    Finally, we can plot these data using the barplot function. The key here is to use names for the age groups.

    > barplot(summedGroups[["awards"]], names = summedGroups[["ageGroups"]])
    

提交回复
热议问题