I\'ve tried to search for an answer, but can\'t seem to find the right one that does the job for me.
I have a dataset (data
) with two variables: people\
We can use the aggregate
function and then use the ggplot2
package. I don't make too many barplots in base R
these days so I'm not sure of the best way to do it without loading ggplot2
:
#data
set.seed(123)
dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
awards = rpois(200, 3))
head(dat)
age awards
1 28 2
2 44 6
3 32 3
4 47 3
5 49 2
6 21 5
#aggregate
sum_by_age <- aggregate(awards ~ age, data = dat, FUN = sum)
library(ggplot2)
ggplot(sum_by_age, aes(x = age, y = awards))+
geom_bar(stat = 'identity')
#create groups
dat$age_group <- ifelse(dat$age <= 30, '20-30',
ifelse(dat$age <= 40, '30-40',
'41 +'))
sum_by_age_group <- aggregate(awards ~ age_group, data = dat, FUN = sum)
ggplot(sum_by_age_group, aes(x = age_group, y = awards))+
geom_bar(stat = 'identity')
We could skip the aggregate
step altogether and just use:
ggplot(dat, aes(x = age, y = awards)) + geom_bar(stat = 'identity')
but I don't prefer that way because I think having an intermediate data step may be useful within your analytical pipeline for comparisons other than visualizing.
For completeness, I am adding the base R
solution to @bouncyball's great answer. I will use their synthetic data, but I will use cut
to create the age groups before aggregation.
# Creates data for plotting
> set.seed(123)
> dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
awards = rpois(200, 3))
# Created a new column containing the age groups
> dat[["ageGroups"]] <- cut(dat[["age"]], c(-Inf, 20, 30, 40, Inf),
right = FALSE)
cut
will divide up a set of numeric data based on breaks defined in the second argument. right = FALSE
flips the breaks so values the groups would include the lower values rather than the upper ones (ie 20 <= x < 30
rather than the default of 20 < x <= 30
). The groups do not have to be equally spaced. If you do not want to include data above or below a certain value, simply remove the Inf
from the end or -Inf
from the beginning respectively, and the function will return <NA>
instead. If you would like to give your groups names, you can do so with the labels
argument.
Now we can aggregate
based on the groups we created.
> (summedGroups <- aggregate(awards ~ ageGroups, dat, FUN = sum))
ageGroups awards
1 [20,30) 188
2 [30,40) 212
3 [40, Inf) 194
Finally, we can plot these data using the barplot
function. The key here is to use names
for the age groups.
> barplot(summedGroups[["awards"]], names = summedGroups[["ageGroups"]])