Distribution sampling
조회 수: 2 (최근 30일)
이전 댓글 표시
I have 2 million samples with three parameters (a,b,c). These are correlated each other and each have different distribution (not gausian or logarithmic). Now I need to collect 60,000 samples of them with same correlation and same distribution. Is there any particular method any one can suggest? Can any one help me?
댓글 수: 0
답변 (1개)
Doug Eastman
2011년 7월 8일
I'm not a statistics expert but I believe randomly sampling a set of data should come close to preserving the distribution and correlation of the original data, so here's a way to take a random subset of length n of an array A:
i = randperm(numel(A));
subset = A(i(1:n));
Here's an example showing the preserved distribution:
N = 100000;
n = 10000;
x = randn(N,1)*3+12;
y = randn(N,1)*2+2;
A = [x;y];
i = randperm(numel(A));
subset = A(i(1:n));
hist(A,100);
figure
hist(subset,100);
댓글 수: 2
Doug Eastman
2011년 7월 11일
Sorry, fixed a typo above, but yes, this will work for any dimension A because it uses linear indexing (only one number for the index).
If you have something like 1000x3 where you want 100x3 (100 of the 1000 original samples), you would do:
i = randperm(size(A,1));
subset = A(i(1:n),:);
참고 항목
카테고리
Help Center 및 File Exchange에서 Descriptive Statistics에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!