ok no worries
to visualize the mean, sigma and 3 sigma on the curve of the probability distribution that approximates your data you have to consider that the bins of the histogram and too wide to directly get accurate marks on the curve, let me explain:
If you try
find(y==my)
=
Empty matrix: 1-by-0
no match
But you can find the histogram bin where the mean is contained in:
find(y>my)
=
Columns 1 through 14
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Columns 15 through 28
17 18 19 20 21 22 23 24 25 26 27 28 29 30
Columns 29 through 31
31 32 33
find(y<=my)
=
Columns 1 through 14
1 2 34 35 36 37 38 39 40 41 42 43 44 45
Columns 15 through 28
46 47 48 49 50 51 52 53 54 55 56 57 58 59
Columns 29 through 42
60 61 62 63 64 65 66 67 68 69 70 71 72 73
Columns 43 through 56
74 75 76 77 78 79 80 81 82 83 84 85 86 87
Columns 57 through 69
88 89 90 91 92 93 94 95 96 97 98 99 100
Let be ny the reference vector of y:
then the mean value is somewhere within the interval
So, to accurately plot my and 3*sy you first have to decide how accurate you need be.
While the y step is, and it varies along the curve
abs(y(33)-y(34))
ans =
0.258998840336178
the x step is constant
abs(x(33)-x(34))
=
0.011562155305573
mean(diff(x))
= 0.011562155305573
max(diff(x))
= 0.011562155305573
min(diff(x))
= 0.011562155305573
So if you decide that 2 decimals precision (on y) is enough, we have to refine x small enough so that at least one point of y interpolated has the value my truncated after 2nd decimal:
for 0.2589 go down to 0.0001 the y step has to be fractioned at least
0.2589/dy=0.0001 hence dy=258.9 , let's take dy=259
the angle
alpha=atand(abs(y(33)-y(34))/abs(x(33)-x(34)))
=
87.443914593184061
since
dx=.0001/atand(alpha)
=
1.119259323130124e-06
x_step=abs(x(33)-x(34))/dx
=
1.033018449490165e+04
make it
then
x2=linspace(x(1),x(end),x_step);
y2=interp1(x,y,x2);
overlap, check both y and y2 are the same:
figure(1);plot(x,y,x2,y2);grid on;grid minor
where in y2 is my located?
find(y2>my)
=
..
..
3396 3397 3398 3399 3400 3401 3402
Columns 3228 through 3234
3403 3404 3405 3406 3407 3408 3409
Columns 3235 through 3238
3410 3411 3412 3413
x_mean=max(find(y2>my))
the actual mean value is going to be approximated with y2(3413) = 3.021925056586602
and this error is acceptable
abs(my-y2(3413))
=
1.358549030059386e-04
put what looks like the mean on the curve:
hold on;plot(x2(x_mean),y2(x_mean),'bo')
. . but when checking
sum(y)
=
3.021789201683596e+02
length(y)
=
100
mean(y)
=
3.021789201683596
and already found mean located between [33 34]
the upper tail cannot even accommodate half single sigma (34.1%)
sum(y([33:end]))/sum(y)*100
=
13.483260209259832
Let's find mu summing samples:
pc_target=50
n=2
pc=sum(y([1:n]))/sum(y)*100
while pc<pc_target
n=n+1
pc=sum(y([1:n]))/sum(y)*100
end
..
..
n = 14
pc = 42.855348381237192
n = 15
pc = 46.872419796584815
n = 16
pc = 50.701800361715335
So it turns out mu is within [15 16]
the x boundaries for y~mu
x([15 16])
=
0.201607447590281 0.213169602895854
Using y2 for more detail
pc_target=50
n=2
pc=sum(y2([1:n]))/sum(y2)*100
while pc<pc_target
n=n+1
pc=sum(y2([1:n]))/sum(y2)*100
end
..
..
n = 1600
pc = 49.982072607170089
n = 1601
pc = 50.018166984408886
for y2, mu is within [1600 1601]
n_mu=1600
and x2 boundaries
x2([1600 1601])
ans =
0.216920307874458 0.217031116526467
n_mu=1600
the upper y2 boundary for 1 sigma (+34.1%)
pc_target=34.1
n=1
pc=sum(y2([1600:1600+n]))/sum(y2)*100
while pc<pc_target
n=n+1
pc=sum(y2([1600:1600+n]))/sum(y2)*100
end
..
..
n =
1475
pc =
34.089262560230956
n =
1476
pc =
34.101853576106748
the numeral distance for +1sigma is
And the location on x2 of +1sigma is:
x2_up_1sigma=x2(1600+n_up_1sigma)
x2_up_1sigma =
0.380363069587555
the lower y2 boundary for 1 sigma (-34.1%)
pc_target=34.1
n=1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
while pc<pc_target
n=n+1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
end
..
..
n=
842
pc =
34.078560325462846
n =
843
pc =
34.117107246027864
the numeral distance for -1sigma is
And the location on x2 of -1sigma is:
x2_down_1sigma=x2(1600-n_down_1sigma)
=
0.123619422882982
Repeating for 3 sigma, understanding that all +- sigma interval cover 89%, then up it is:
pc_target=44.5
n=1
pc=sum(y2([1600:1600+n]))/sum(y2)*100
while pc<pc_target
n=n+1
pc=sum(y2([1600:1600+n]))/sum(y2)*100
end
..
..
n =
2836
pc =
44.496938162892050
n =
2837
pc =
44.501155111466957
n_up_3sigma=2836;
x2(n_up_3sigma)
=
0.353879801757433
and down we go:
pc_target=44.5
n=1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
while pc<pc_target
n=n+1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
end
..
..
n =
1161
pc =
44.484883787669325
n =
1162
pc =
44.509871734900422
n_down_3sigma=1161;
x2(n_down_3sigma)
ans =
0.168275309642560
and the plot you asked for:
plot(x2,y2,x2(1600),y2(1600),'ro',...
x2(n_mu+n_up_1sigma),y2(n_mu+n_up_1sigma),'rd',x2(n_mu-n_down_1sigma),y2(n_mu-n_down_1sigma),'rd',...
x2(n_mu+n_up_3sigma),y2(n_mu+n_up_3sigma),'gd',x2(n_mu-n_down_3sigma),y2(n_mu-n_down_3sigma),'gd')
grid on;grid minor;
additional comments:
1. do not mix lognormal mu sigma and normal mu sigma parameters: mind the notation gap. Your x is a reference vector and your y is the actual distribution.
However, in
MATLAB function lognpdf calculates the lognormal distribution Y out of the normal distribution X, where X has mean mu and standard variance sigma. In lognpdf help page, both Y and X are random variables, data, each with their respective reference vectors.
As explained in lognpdf help page, the 'mu' and 'sigma' of Y are related to those belonging to X by:
m=exp(mu+sigma^2/2)
v=exp(2*mu+sigma^2)*(exp(sigma^2)-1)
and
mu=log(m^2/(v+m^2)^.5)
sigma=(log(v/m^2)+1)^.5
bankers seem to have a lot of interest on lognormal distributions
would you mind to both vote on the thumbs-up and
mark my answer as accepted?
thanks in advance
John