probability distribution from a simple vector
    조회 수: 6 (최근 30일)
  
       이전 댓글 표시
    
Assume a vector like... 1 3 2 3 1 4 2 3 1 3 4 2 1 1 2 3 4
How can I calculate the likelyhood that nr 2 follows after nr 3 or nr 1 follows after nr 2? Ideally I would like to display this relationship for all numbers in a probability distribution.
댓글 수: 0
채택된 답변
  Rik
      
      
 2018년 8월 17일
        
      편집: Rik
      
      
 2018년 8월 20일
  
      Use meshgrid to generate all combinations and loop through them to count all occurrences. To convert to probability, divide by the total number of elements.
data=[1 3 2 3 1 4 2 3 1 3 4 2 1 1 2 3 5];
[first,second]=meshgrid(unique(data));
out=zeros(size(first));
for k=1:numel(first)
    out(k)=sum(...
        data(1:(end-1))==first(k) &...
        data(2:end)==second(k));
end
P=out/numel(data);
figure(1),clf(1)
x=1:size(out,1);
y=size(out,2):-1:1;%flip y-direction
y_label=cellfun(@(x) num2str(x),num2cell(y),'UniformOutput',0);
image(x,y,P,'CDataMapping','scaled')
colormap(gray)
set(gca,'XTick',x)
set(gca,'YTick',y(end:-1:1))
set(gca,'YTickLabel',y_label)
xlabel('First value')
ylabel('Second value')
댓글 수: 3
  Rik
      
      
 2018년 8월 20일
				To more easily add a scale, I've changed the previous code from imshow to image. I've also flipped the y-direction to have the (0,0) position in the lower left corner.
추가 답변 (2개)
  John D'Errico
      
      
 2022년 6월 8일
        Note that the use of meshgrid is wildly inefficient, if all you want to know is count the frequency of one number following another. For example, suppose the vector had a length of 1e6? Then you would be generating matrices with meshgrid of size 1e7 by 1e7. Do you really want that? Do you have enough memory?
M = 1e7*1e7;
disp("At a minimum, approximately " + M/1e9/8 + " gigabytes of RAM will be required to perform your computation.")
That seems like a lot, so unless you have god's computer on your desktop, you might consider alternatives. :)
For example:
n = 1e7;
datavector = randi(7,[1,n]);
% counts(i,j) gives the number of events where i fell directly before j in the vector
ind = (1:n-1);
counts = accumarray([datavector(ind);datavector(ind+1)]',1)
And for the actual frequency of those events in this sample, we have:
freq = counts/n
So we see a remarkably uniform distribution, as would be expected in this specific case, since randi will indeed be a uniform random genertor of integers.
Our expectation for the true frequency would be (as the sample size approaches infinity) is of course:
format long
1/(7*7)
댓글 수: 0
  Steven Lord
    
      
 2022년 6월 8일
        If you only have a small number of potential states (and they're all integer values) you could try histcounts2.
rng default
A = randi(6, 100, 1);
histcounts2(A(1:end-1), A(2:end), 'BinMethod', 'integers')
Let's validate the 5 that is in element (5, 2).
locationOfFirst5 = find(A(1:end-1) == 5 & A(2:end) == 2)
There are five (5, 2) pairs.
A([locationOfFirst5, locationOfFirst5+1])
The other 5s are followed by other values.
other5s = find(A(1:end-1) == 5 & A(2:end) ~= 2);
A([other5s, other5s+1])
any(A(other5s+1) == 2) % false
If you want probabilities use a different Normalization.
histcounts2(A(1:end-1), A(2:end), 'BinMethod', 'integers', 'Normalization', 'probability')
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



