Calculating mean squared error or maybe MISE

조회 수: 17 (최근 30일)
Neuropragmatist
Neuropragmatist 2019년 7월 25일
댓글: Neuropragmatist 2019년 8월 8일
Hi all,
I'm interested in comparing different bivariate histograms to an underlying 2D probability density function.
Additional info that you can skip for time:
My aim is to try and find the optimal bin size and smoothing for the histogram that best represents the known density function. In my field this is a common problem that doesn't really have a clear solution - there are many ways to estimate optimal bin size but I can't find any that also take smoothing into account, furthermore the histogram I want to compare is actually calculated as the ratio of 2 histograms generated with the same parameters but over very different underlying distributions. I have also not found any method for optimising parameters in such a situation. My ultimate aim is to generate histograms using a variety of different approaches and smoothing to try and find the 'best' or at least the best for different scenarios.
My first approach was to generate the histogram and then correlate the result with the PDF sampled at the same points (i.e. the histogram bin centers). Reading the literature a bit more I think I want to use the mean squared error (MSE) instead, but I'm not sure if this is a) appropriate or b) meaningful. Also, the Wikipedia page for MSE lists two equations and I'm not sure which is appropriate in this situation. I'm also worried that I should be calcualting the mean integrated squared error (MISE) instead, but I don't know how to do that for a discrete histogram vs a continuous PDF both of which are 2D. I have Matlab 2018b and all the toolboxes.
Here is the code I have so far:
% generate distribution of points, make histogram of these and get actual PDF underlying this
mu = [100 100];
sigma = [60 50;50 80];
num = 100;
pos1 = mvnrnd(mu,sigma,num); % the points
% in this example we will just have one distribution, but in the real data there are multiple such distributions all summed together
% which makes fitting a continuous function to the real data nearly impossible
bcx = 0:5:200;
bcy = 0:5:200;
[x,y] = meshgrid(bcx,bcy); % the grid over which to generate histogram or evaluate PDF
bcents = [x(:) y(:)];
map1 = mvnpdf(bcents,mu,sigma); % the PDF
map1 = reshape(map1,size(x));
map2 = hist3(pos1,'Ctrs',{bcx(:) bcy(:)}); % the histogram
% plot all three
figure
subplot(1,3,1)
plot(pos1(:,1),pos1(:,2),'ko')
axis([0 200 0 200])
axis square xy
title('Points')
subplot(1,3,2)
imagesc(map1)
axis square xy
title('PDF')
subplot(1,3,3)
imagesc(map2)
axis square xy
title('Histogram')
% calculate MSE
map_pdf = map_pdf .* 25; % scale so sum is unity (i.e probability - multiply by bin area to approximate Riemann sum)
map_hist = map_hist./sum(map_hist(:)); % scale so sum is unity (i.e probability)
mse = sum((map_pdf(:)-map_hist(:)).^2) .* (1/numel(map_pdf))
cor = corr(map_pdf(:),map_hist(:),'rows','pairwise')

답변 (1개)

Ganesh Regoti
Ganesh Regoti 2019년 8월 8일
Refer KSdensity which might serve your purpose. Here is the link
  댓글 수: 1
Neuropragmatist
Neuropragmatist 2019년 8월 8일
I don't think that's really relevant, I already have a PDF generated by mvnpdf and I have a histogram generated by histcounts2, the question is about how to compare the two distributions.

댓글을 달려면 로그인하십시오.

제품


릴리스

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by