resample data based on a particular variable

조회 수: 3 (최근 30일)
Boram Lim
Boram Lim 2018년 5월 4일
댓글: Boram Lim 2018년 5월 4일
I have a large dataset as below. From the data, I want to randomly sample based on 'id' produce the same size data. Since the data has 5 ids, I would like to sample 5 ids with replacement and produce a dataset.
id value var1 var2
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16
With the data, the desired output could be as below (because I want to sample ids with replacement, there could be duplicated ids)
id value var1 var2
2 5
2 6
2 7
4 11
4 12
4 13
3 8
3 9
3 10
2 5
2 6
2 7
1 1
1 2
1 3
1 4
  댓글 수: 2
KSSV
KSSV 2018년 5월 4일
What is the difference between both the datasets? They are same.......in the second one you have repeated id 2.
Boram Lim
Boram Lim 2018년 5월 4일
I want to randomly resample data based on id variable

댓글을 달려면 로그인하십시오.

답변 (1개)

KSSV
KSSV 2018년 5월 4일
A = [1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16 ];
id = A(:,1) ; val = A(:,2) ;
N = max(id) ;
idx = randperm(N) ;
iwant = cell(N,1) ;
for i = 1:N
iwant{i} = A(id==idx(i),:) ;
end
iwant = cell2mat(iwant)
  댓글 수: 1
Boram Lim
Boram Lim 2018년 5월 4일
Thank you for your comment. However, any simple way without using for-loop? my data size is around 10million and this work should be done several times.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Type Identification에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by