ggplot2: how to align the bars of a histogram with the x axis?

前端 未结 3 1746
攒了一身酷
攒了一身酷 2020-12-03 14:46

Consider this simple example

library(ggplot2)
dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
ggplot(dat, aes(x = number)) + geom_histogram()
         


        
相关标签:
3条回答
  • 2020-12-03 15:06

    This worked for me

    + scale_x_continuous(limits = c(0, NA)) 
    

    From ?scale_x_continuous, limits is:

    One of:

    • NULL to use the default scale range

    • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

    • A function that accepts the existing (automatic) limits and returns new limits Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

    0 讨论(0)
  • 2020-12-03 15:17

    Why are the bars "weirdly aligned"?

    Let me start by explaining, why your code leads to weirdly aligned bars. This has to do with the way a histogram is constructed. First, the x-axis is split up into intervals and then, the number of values in each interval is counted.

    By default, ggplot splits the data up into 30 bins. It even spits out a message that says so:

    stat_bin() using bins = 30. Pick better value with binwidth.

    The default number of is not always a good choice. In your case, where all the data points are integers, one might want to choose the boundaries of the bins as 5, 6, 7, 8, ... or 4.5, 5.5, 6.5, ..., such that each bin contains exactly one integer value. You can obtain the boundaries of the bins that have been used in the plot as follows:

    data <- data.frame(number = c(5, 10, 11 ,12, 12, 12, 13, 15, 15))
    p <- ggplot(data, aes(x = number)) + geom_histogram()
    ggplot_build(p)$data[[1]]$xmin
    ##  [1]  4.655172  5.000000  5.344828  5.689655  6.034483  6.379310  6.724138  7.068966  7.413793
    ## [10]  7.758621  8.103448  8.448276  8.793103  9.137931  9.482759  9.827586 10.172414 10.517241
    ## [19] 10.862069 11.206897 11.551724 11.896552 12.241379 12.586207 12.931034 13.275862 13.620690
    ## [28] 13.965517 14.310345 14.655172
    

    As you can see, the boundaries of the bins are not chosen in a way that would lead to a nice alignment of the bars with integers.

    So, in short, the reason for the weird alignment is that ggplot simply uses a default number of 30 bins, which is not suitable, in your case, to have bars that are nicely aligned with integers.

    There are (at least) two ways to get nicely aligned bars that I will discuss in the following

    Use a bar plot instead

    Since you have integer data, a histogram may just not be the appropriate choice of visualisation. You could instead use geom_bar(), which will lead to bars that are centered on integers:

    ggplot(data, aes(x = number)) + geom_bar() + scale_x_continuous(breaks = 1:16)
    

    You could move the bars to the right of the integers by adding 0.5 to number:

    ggplot(data, aes(x = number + 0.5)) + geom_bar() + scale_x_continuous(breaks = 1:16)
    

    Create a histogram with appropriate bins

    If you nevertheless want to use a histogram, you can make ggplot to use more reasonable bins as follows:

    ggplot(data, aes(x = number)) +
      geom_histogram(binwidth = 1, boundary = 0, closed = "left") +
      scale_x_continuous(breaks = 1:16)
    

    With binwidth = 1, you override the default choice of 30 bins and explicitly require that bins should have a width of 1. boundary = 0 ensures that the binning starts at an integer value, which is what you need, if you want the integers to be to the left of the bars. (If you omit it, bins are chosen such that the bars are centered on integers.)

    The argument closed = "left" is a bit more tricky to explain. As I described above, the boundaries of the bins are now chosen to be 5, 6, 7, .... The question is now, in which bin, e.g., 6 should be? It could be either the first or second one. This is the choice that is controlled by closed: if you set it to "right" (the default), then the bins are closed on the right, meaning that the right boundary of the bin will be included, while the left boundary belongs to the bin to the left. So, 6 would be in the first bin. On the other hand, if you chose "left", the left boundary will be part of the bin and 6 would be in the second bin.

    Since you want the bars to be to the left of the integers, you need to pick closed = "left".

    Comparison of the two solutions

    If you compare the histogram with the bar plot, you will notice two differences:

    • There is a little gap between the bars in the bar plot, while they touch in the histogram. You could make the bars touch in the former by using geom_bar(width = 1).
    • The right most bar is between 15 and 16 for the bar plot, while it is between 14 and 15 for the histogram. The reason is that while for all the bins only the left boundary is part of the bin, for the right most bin, both boundaries are included.
    0 讨论(0)
  • 2020-12-03 15:26

    This will center the bar on the value

    data <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
    ggplot(data,aes(x = number)) + geom_histogram(binwidth = 0.5)
    

    Here is a trick with the tick label to get the bar align on the left.. But if you add other data, you need to shift them also

    ggplot(data,aes(x = number)) + 
      geom_histogram(binwidth = 0.5) + 
      scale_x_continuous(
        breaks=seq(0.75,15.75,1), #show x-ticks align on the bar (0.25 before the value, half of the binwidth) 
        labels = 1:16 #change tick label to get the bar x-value
        )
    

    other option: binwidth = 1, breaks=seq(0.5,15.5,1) (might make more sense for integer)

    0 讨论(0)
提交回复
热议问题