probability distribution from a simple vector

Question

Florian 2018년 8월 17일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415210-probability-distribution-from-a-simple-vector

답변: Steven Lord 2022년 6월 8일

Assume a vector like... 1 3 2 3 1 4 2 3 1 3 4 2 1 1 2 3 4

How can I calculate the likelyhood that nr 2 follows after nr 3 or nr 1 follows after nr 2? Ideally I would like to display this relationship for all numbers in a probability distribution.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Rik 2018년 8월 17일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415210-probability-distribution-from-a-simple-vector#answer_333102

편집: Rik 2018년 8월 20일

MATLAB Online에서 열기

Use meshgrid to generate all combinations and loop through them to count all occurrences. To convert to probability, divide by the total number of elements.

data=[1 3 2 3 1 4 2 3 1 3 4 2 1 1 2 3 5];
[first,second]=meshgrid(unique(data));
out=zeros(size(first));
for k=1:numel(first)
    out(k)=sum(...
        data(1:(end-1))==first(k) &...
        data(2:end)==second(k));
end
P=out/numel(data);
figure(1),clf(1)
x=1:size(out,1);
y=size(out,2):-1:1;%flip y-direction
y_label=cellfun(@(x) num2str(x),num2cell(y),'UniformOutput',0);
image(x,y,P,'CDataMapping','scaled')
colormap(gray)
set(gca,'XTick',x)
set(gca,'YTick',y(end:-1:1))
set(gca,'YTickLabel',y_label)
xlabel('First value')
ylabel('Second value')

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Florian 2018년 8월 20일

편집: Rik 2018년 8월 20일

Thanks for your answer, Rik. What is still unclear to me is how I need to read the figure when reading off the vaules of probability for one number following another: is it from the left to right or from bottom to top?

Rik 2018년 8월 20일

To more easily add a scale, I've changed the previous code from imshow to image. I've also flipped the y-direction to have the (0,0) position in the lower left corner.

댓글을 달려면 로그인하십시오.

Answer 2

John D'Errico 2022년 6월 8일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415210-probability-distribution-from-a-simple-vector#answer_981610

MATLAB Online에서 열기

Note that the use of meshgrid is wildly inefficient, if all you want to know is count the frequency of one number following another. For example, suppose the vector had a length of 1e6? Then you would be generating matrices with meshgrid of size 1e7 by 1e7. Do you really want that? Do you have enough memory?

M = 1e7*1e7;
disp("At a minimum, approximately " + M/1e9/8 + " gigabytes of RAM will be required to perform your computation.")
At a minimum, approximately 12500 gigabytes of RAM will be required to perform your computation.

That seems like a lot, so unless you have god's computer on your desktop, you might consider alternatives. :)

For example:

n = 1e7;
datavector = randi(7,[1,n]);
% counts(i,j) gives the number of events where i fell directly before j in the vector
ind = (1:n-1);
counts = accumarray([datavector(ind);datavector(ind+1)]',1)
counts = 7×7
      204176      203419      204076      203722      204089      204579      204456
      204407      204309      203879      203265      203700      204747      203963
      203924      203798      203312      203958      203727      204337      204412
      204126      203904      204150      204138      204430      204159      203838
      204168      204210      204208      204192      204488      203748      203433
      204174      204300      204399      204592      204150      204157      204233
      203542      204330      203444      204878      203862      204279      204212

And for the actual frequency of those events in this sample, we have:

freq = counts/n
freq = 7×7
0204    0.0203    0.0204    0.0204    0.0204    0.0205    0.0204
0204    0.0204    0.0204    0.0203    0.0204    0.0205    0.0204
0204    0.0204    0.0203    0.0204    0.0204    0.0204    0.0204
0204    0.0204    0.0204    0.0204    0.0204    0.0204    0.0204
0204    0.0204    0.0204    0.0204    0.0204    0.0204    0.0203
0204    0.0204    0.0204    0.0205    0.0204    0.0204    0.0204
0204    0.0204    0.0203    0.0205    0.0204    0.0204    0.0204

So we see a remarkably uniform distribution, as would be expected in this specific case, since randi will indeed be a uniform random genertor of integers.

Our expectation for the true frequency would be (as the sample size approaches infinity) is of course:

format long
1/(7*7)
ans = 
   0.020408163265306

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 3

Steven Lord 2022년 6월 8일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415210-probability-distribution-from-a-simple-vector#answer_981620

MATLAB Online에서 열기

If you only have a small number of potential states (and they're all integer values) you could try histcounts2.

rng default
A = randi(6, 100, 1);
histcounts2(A(1:end-1), A(2:end), 'BinMethod', 'integers')
ans = 6×6
     4     3     4     2     1     3
     1     1     1     3     5     3
     1     1     3     4     3     2
     5     2     2     1     3     2
     2     5     2     2     6     4
     4     2     3     3     2     4

Let's validate the 5 that is in element (5, 2).

locationOfFirst5 = find(A(1:end-1) == 5 & A(2:end) == 2)
locationOfFirst5 = 5×1
    37
    44
    50
    61
    73

There are five (5, 2) pairs.

A([locationOfFirst5, locationOfFirst5+1])
ans = 5×2
   2
   2
   2
   2
   2

The other 5s are followed by other values.

other5s = find(A(1:end-1) == 5 & A(2:end) ~= 2);
A([other5s, other5s+1])
ans = 16×2
     5     6
     5     1
     5     6
     5     5
     5     5
     5     3
     5     1
     5     5
     5     5
     5     5
any(A(other5s+1) == 2) % false
ans = logical
   0

If you want probabilities use a different Normalization.

histcounts2(A(1:end-1), A(2:end), 'BinMethod', 'integers', 'Normalization', 'probability')
ans = 6×6
0404    0.0303    0.0404    0.0202    0.0101    0.0303
0101    0.0101    0.0101    0.0303    0.0505    0.0303
0101    0.0101    0.0303    0.0404    0.0303    0.0202
0505    0.0202    0.0202    0.0101    0.0303    0.0202
0202    0.0505    0.0202    0.0202    0.0606    0.0404
0404    0.0202    0.0303    0.0303    0.0202    0.0404

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

probability distribution from a simple vector

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (2개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

probability distribution from a simple vector

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (2개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기