问题

I need to reproduce the normalized density p(x) below, but the code given does not generate a normalized PDF.

clc, clear
% Create three distribution objects with different parameters
pd1 = makedist('Uniform','lower',2,'upper',6);
pd2 = makedist('Uniform','lower',2,'upper',4);
pd3 = makedist('Uniform','lower',5,'upper',6);
% Compute the pdfs
x = -1:.01:9;
pdf1 = pdf(pd1,x); 
pdf2 = pdf(pd2,x); 
pdf3 = pdf(pd3,x); 
% Sum of uniforms
pdf = (pdf1 + pdf2 + pdf3);
% Plot the pdfs
figure;
stairs(x,pdf,'r','LineWidth',2);

If I calculate the normalized mixture PDF by simply scaling them by their total sum, I have different normalized probability comparing with the original figure above.

pdf = pdf/sum(pdf);

回答1:

Mixture

A mixture of two random variables means with probability p use Distribution 1, and with probability 1-p use Distribution 2.

Based on your graph, it appears you are mixing the distributions rather than adding (convolving) them. The precise results matter very much upon the mixing probabilities. As an example, I've chosen a = 0.25, b = 0.35, and c = 1-a-b.

For a mixture, the probability density function (PDF) is analytically available:
pdfMix =@(x) a.*pdf(pd1,x) + b.*pdf(pd2,x) + c.*pdf(pd3,x).

% MATLAB R2018b
pd1 = makedist('Uniform',2,6);
pd2 = makedist('Uniform',2,4);
pd3 = makedist('Uniform',5,6);
a = 0.25;
b = 0.35;
c = 1 - a - b;    % a + b + c = 1

pdfMix =@(x) a.*pdf(pd1,x) + b.*pdf(pd2,x) + c.*pdf(pd3,x);

Xrng = 0:.01:8;
plot(Xrng,pdfMix(Xrng))
xlabel('X')
ylabel('Probability Density Function')

Since the distributions being mixed are uniform you could also use the stairs command: stairs(Xrng,pdfMix(Xrng)).

We can verify this is a valid PDF by ensuring the total area is 1.
integral(pdfMix,0,9)

ans = 1.0000

Convolution: Adding Random Variables

Adding the random variables together yields a different result. Again, this can be done empirically easily. It is possible to this analytically. For example, convolving two Uniform(0,1) distributions yields a Triangular(0,1,2) distribution. The convolution of random variables is just a fancy way of saying we add them up and there is a way to obtain the resulting PDF using integration if you're interested in analytical results.

N = 80000;                  % Number of samples
X1 = random(pd1,N,1);       % Generate samples     
X2 = random(pd2,N,1);
X3 = random(pd3,N,1);

X = X1 + X2 + X3;           % Convolution

Notice the change of scale for the x-axis (Xrng = 0:.01:16;).

To obtain this, I generated 80k samples from each distribution with random then added them up to obtain 80k samples of the desired convolution. Notice when I used histogram I used the 'Normalization', 'pdf' option.

Xrng = 0:.01:16;
figure, hold on, box on
p(1) = plot(Xrng,pdf(pd1,Xrng),'DisplayName','X1 \sim U(2,6)')
p(2) = plot(Xrng,pdf(pd2,Xrng),'DisplayName','X2 \sim U(2,4)')
p(3) = plot(Xrng,pdf(pd3,Xrng),'DisplayName','X3 \sim U(5,6)')
h = histogram(X,'Normalization','pdf','DisplayName','X = X1 + X2 + X3')

% Cosmetics
legend('show','Location','northeast')
for k = 1:3
    p(k).LineWidth = 2.0;
end
title('X = X1 + X2 + X3 (50k samples)')
xlabel('X')
ylabel('Probability Density Function (PDF)')

You can obtain an estimate of the PDF using the fitdist and the Kernel distribution object then calling the pdf command on the resulting Kernel distribution object.

pd_kernel = fitdist(X,'Kernel')

figure, hold on, box on
h = histogram(X,'Normalization','pdf','DisplayName','X = X1 + X2 + X3')
pk = plot(Xrng,pdf(pd_kernel,Xrng),'b-')           % Notice use of pdf command
legend('Empirical','Kernel Distribution','Location','northwest')

If you do this, you'll notice the resulting kernel is unbounded but you can correct this since you know the bounds using truncate. You could also use the ksdensity function, though the probability distribution object approach is probably more user friendly due to all the functions you have direct access to. You should be aware that the kernel is an approximation (you can see that clearly in the kernel plot). In this case, the integration to convolve 3 uniform distributions isn't too bad, so finding the PDF analytically is probably the preferred choice if the PDF is desired. Otherwise, empirical approaches (especially for generation), are probably sufficient though this depends on your application.