问题
I've got a set with >10000 integers attaining values between 1 and 500. I want to plot the values in form of a histogram, however, since only a few integers attain values greater than 200, I want to use a logarithmic scale for the y-axis.
A problem emerges, when one bin has a count of zero, since the logarithmic value goes to -infinity.
To avoid this, I want to add a pseudocount of 1 to each bin. In a standard hist()-plot I can do this like follows:
hist.data = hist(data, plot=F, breaks=30)
hist.data$counts = log10(hist.data$counts + 1)
plot(hist.data, ...)
However, I struggle to find a way to access the counts in ggplot.
Is there a simple way to do this, or are there other recommended ways to deal with this problem?
回答1:
One way to achieve this is to write your own transformation function for the y scale. Transformations functions used by ggplot2 (when using scale_y_log10()
for instance) are defined in the scales
package.
Short answer
library(ggplot2)
library(scales)
mylog10_trans <- function (base = 10)
{
trans <- function(x) log(x + 1, base)
inv <- function(x) base^x
trans_new(paste0("log-", format(base)), trans, inv, log_breaks(base = base),
domain = c(1e-100, Inf))
}
ggplot(df, aes(x=x)) +
geom_histogram() +
scale_y_continuous(trans = "mylog10")
output
data used for this figure:
df <- data.frame(x=sample(1:100, 10000, replace = TRUE))
df$x[sample(1:10000, 50)] <- sample(101:500, 50)
Explaining the trans function
Let's examine scales::log10_trans
; it calls scales::log_trans()
; now, scales::log_trans
prints as:
function (base = exp(1))
{
trans <- function(x) log(x, base)
inv <- function(x) base^x
trans_new(paste0("log-", format(base)), trans, inv, log_breaks(base = base),
domain = c(1e-100, Inf))
}
<environment: namespace:scales>
In the answer above, I replaced:
trans <- function(x) log(x, base)
with:
trans <- function(x) log(x + 1, base)
来源:https://stackoverflow.com/questions/41849951/using-ggplot-geo-geom-histogram-with-y-log-scale-with-zero-bins