I am trying to estimate the probability density function of my data. IN my case, the data is a satellite image with a shape 8200 x 8100. Below, I present you the code of PDF
The issue is that the x values in the PDF plot are not sorted, so the plotted line is going back and forwards between random points, creating the mess you see.
Two options:
Don't plot the line, just plot points (not great if you have lots of points, but will confirm if what I said above is right or not):
plt.plot(lst_flat_filtered, fit, 'bo')
Sort the lst_flat_filtered
array before calculating the PDF and plotting it:
lst_flat = np.r_[lst_flat]
lst_flat_filtered = np.sort(lst_flat[~is_outlier(lst_flat)]) # Changed this line
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.plot(lst_flat_filtered, fit)
Here's some minimal examples showing these behaviours:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.random.normal(7, 5, 1000)
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit)
plt.show()
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.random.normal(7, 5, 1000)
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit, 'bo')
plt.show()
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.sort(np.random.normal(7, 5, 1000))
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit)
plt.show()