Fit Gaussian mixture model with weighted observations

조회 수: 22 (최근 30일)
Wolfgang Schwanghart
Wolfgang Schwanghart 2019년 11월 23일
답변: Omkar Mulekar 2020년 6월 5일
Hi everyone, looking at the help of fitgmdist, I cannot see that there is the possibility to weight observations. Is there a reason? Many functions of the Statistics and Machine Learning toolbox support weights. Does anyone have an idea how to include weights, or can anyone point me to an alternative?
  댓글 수: 3
Wolfgang Schwanghart
Wolfgang Schwanghart 2019년 11월 26일
This could be an option...
Adam Danz
Adam Danz 2019년 11월 26일
If you end up giving that a try, keep in mind that the weights must be converted to integers and depending on how that's carried out, it could vastly increase the number of data points. Feel free to pull me in if you decide to go down this route and get stuck.
In a sense, by duplicating the values of the data being fit, you are strengthening their representation in the fit and that's kind of like weighting.

댓글을 달려면 로그인하십시오.

답변 (3개)

Kaashyap Pappu
Kaashyap Pappu 2019년 11월 26일
The function fitgmdist fits a distribution to a given data set. This data set generally has points belonging to the same class therefore the ‘weight’ parameter is not needed, since you are essentially just fitting a distribution model to given data.
Functions such as fitcknn, fitcsvm have weights because those are classification models. Weights become essential when data from multiple classes is present for training, but there is a class imbalance, that is data points for each class are not in equal proportion. To account for this imbalance, weights are used and are essential input arguments.
Hope this helps!

Jeff Miller
Jeff Miller 2019년 11월 26일
It's not exactly clear (to me either) what it means to weight the different observations in this context, but maybe you have something like this in mind:
You have observations X(1:n) with weights W(1:n). Let sumW = sum(W).
Make a new dataset Y with (say) 10000 observations consisting of
round(W(1)/sumW*10000) copies of X(1)
round(W(2)/sumW*10000) copies of X(2)
etc--that is, round(W(i)/sumW*10000) copies of X(i)
Now use fitgmdist with Y. Every Y value will be weighted equally, but the different X's will have weights approximately proportional to their original W values--because their numbers will be in those proportions.
I hope that is clear.
  댓글 수: 3
Jeff Miller
Jeff Miller 2019년 11월 27일
편집: Jeff Miller 2019년 11월 27일
What about generating a lot of pseudo-observations from the risk normalized kernel densities and then fitting the gmm to those?
Wolfgang Schwanghart
Wolfgang Schwanghart 2019년 11월 27일
Yes, this was my initial thought, and Adam Danz (see above) also came up with the idea. However, after giving the whole approach some thought, I think that the weighting scheme may not lead to the desired results. Rather, I think, we should normalize the probability density function as we obtain it from pdf(gmm,distance) with, let's say, a kernel density estimate of the distance values. I guess this will turn out increasingly difficult if we have models with many variables.

댓글을 달려면 로그인하십시오.


Omkar Mulekar
Omkar Mulekar 2020년 6월 5일
There seems to be an answer in this paper:
They talk about a couple of methods for EM using weighted data. See if it's useful for you!

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by