Find local minimum in bimodal distribution with r

前端 未结 1 534
一个人的身影
一个人的身影 2021-01-01 04:17

My data are pre-processed image data and I want to seperate two classes. In therory (and hopefully in practice) the best threshold is the local minimum between the two peaks

相关标签:
1条回答
  • 2021-01-01 05:10

    There are a few ways to do this.

    First, using d for the density as in your question, d$x and d$y contain the x and y values for the density plot. The minimum occurs when the derivative dy/dx = 0. Since the x-values are equally spaced, we can estimate dy using diff(d$y), and seek d$x where abs(diff(d$y)) is minimized:

    d$x[which.min(abs(diff(d$y)))]
    # [1] 2.415785
    

    The problem is that the density curve could also be maximized when dy/dx = 0. In this case the minimum is shallow but the maxima are peaked, so it works, but you can't count on that.

    So a second way uses optimize(...) which seeks a local minimum in a given interval. optimize(...) needs a function as argument, so we use approxfun(d$x,d$y) to create an interpolation function.

    optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
    # [1] 2.415791
    

    Finally, we show that this is indeed the minimum:

    hist(data,prob=TRUE)
    lines(d, col="red", lty=2)
    v <- optimize(approxfun(d$x,d$y),interval=c(1,4))$minimum
    abline(v=v, col="blue")
    

    Another approach, which is preferred actually, uses k-means clustering.

    df <- read.csv(header=F,"data.txt")
    colnames(df) = "X"
    
    # bimodal
    km <- kmeans(df,centers=2)
    df$clust <- as.factor(km$cluster)
    library(ggplot2)
    ggplot(df, aes(x=X)) + 
      geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
                         binwidth=0.5, color="grey50")+
      stat_density(geom="line", color="red")
    

    The data actually looks more trimodal than bimodal.

    # trimodal
    km <- kmeans(df,centers=3)
    df$clust <- as.factor(km$cluster)
    library(ggplot2)
    ggplot(df, aes(x=X)) + 
      geom_histogram(aes(fill=clust,y=..count../sum(..count..)),
                     binwidth=0.5, color="grey50")+
      stat_density(geom="line", color="red")
    

    0 讨论(0)
提交回复
热议问题