Normal distribution appears too dense when plotted in matplotlib

前端 未结 1 748
無奈伤痛
無奈伤痛 2021-01-16 22:41

I am trying to estimate the probability density function of my data. IN my case, the data is a satellite image with a shape 8200 x 8100. Below, I present you the code of PDF

相关标签:
1条回答
  • 2021-01-16 23:24

    The issue is that the x values in the PDF plot are not sorted, so the plotted line is going back and forwards between random points, creating the mess you see.

    Two options:

    1. Don't plot the line, just plot points (not great if you have lots of points, but will confirm if what I said above is right or not):

      plt.plot(lst_flat_filtered, fit, 'bo')
      
    2. Sort the lst_flat_filtered array before calculating the PDF and plotting it:

      lst_flat = np.r_[lst_flat]
      lst_flat_filtered = np.sort(lst_flat[~is_outlier(lst_flat)])  # Changed this line
      fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
      
      plt.plot(lst_flat_filtered, fit)
      

    Here's some minimal examples showing these behaviours:

    Reproducing your problem:

    import numpy as np
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    lst_flat_filtered = np.random.normal(7, 5, 1000)
    
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.hist(lst_flat_filtered, bins=30, normed=True)
    
    plt.plot(lst_flat_filtered, fit)
    
    plt.show()
    

    Plotting points

    import numpy as np
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    lst_flat_filtered = np.random.normal(7, 5, 1000)
    
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.hist(lst_flat_filtered, bins=30, normed=True)
    
    plt.plot(lst_flat_filtered, fit, 'bo')
    
    plt.show()
    

    Sorting the data

    import numpy as np
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    lst_flat_filtered = np.sort(np.random.normal(7, 5, 1000))
    
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.hist(lst_flat_filtered, bins=30, normed=True)
    
    plt.plot(lst_flat_filtered, fit)
    
    plt.show()
    

    0 讨论(0)
提交回复
热议问题