Efficient script to isolate one sub-dataset k-times.

조회 수: 1 (최근 30일)
Vic
Vic 2024년 3월 3일
댓글: Vic 2024년 3월 7일
Hi everyone,
The idea is to divide the main dataset into k sub-datasets and delete 1 bin each time and remerge the other sub-datasets. In a nutshell, k bins will create k different sub-datasets. Since the number of bins mays not be a multiple of the number of row in the matrix (Bin k has often less rows), I had to use cell arrays.
Here is an illustration of the general idea for k = 2.
Question:
How can I remove the loop or make this code more efficient?
Here is my script.
------------------------------------------------------
Variables = rand(245,57);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
for i = 1:length(Bin_size)-1
if i == 1
Bin_Variables2{1} = Variables(Bin_size(2):Bin_size(end),:);
else
Bin_Variables2{i} = [Variables(Bin_size(1):Bin_size(i)-1,:); Variables(Bin_size(i+1):Bin_size(end),:)];
end
end
Thanks for your inputs
  댓글 수: 2
Voss
Voss 2024년 3월 5일
편집: Voss 2024년 3월 5일
Two observations:
  1. The last row of Variables is included as the last row of every element of Bin_Variables2 (because Bin_size(end) is always included).
  2. When size(Variables,1) is a multiple of Bin_numb, I expect you'd want each element of Bin_Variables2 to be the same size, but that's not what happens.
To illustrate:
Variables = rand(242,7);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
for i = 1:length(Bin_size)-1
if i == 1
Bin_Variables2{1} = Variables(Bin_size(2):Bin_size(end),:);
else
Bin_Variables2{i} = [Variables(Bin_size(1):Bin_size(i)-1,:); Variables(Bin_size(i+1):Bin_size(end),:)];
end
end
Observation 1: last row always the same:
fprintf('%36s%s\n','Last row of Variables: ',sprintf('%6.4g ',Variables(end,:)));
Last row of Variables: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156
for ii = 1:numel(Bin_Variables2)
fprintf('%36s%s\n',sprintf('Last row of Bin_Variables2{%d}: ',ii),sprintf('%6.4g ',Bin_Variables2{ii}(end,:)));
end
Last row of Bin_Variables2{1}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{2}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{3}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{4}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{5}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{6}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{7}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{8}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{9}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{10}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{11}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156
Observation 2: unequally sized result matrices even though 242 is a multiple of 11:
bin_sizes = cellfun(@(x)size(x,1),Bin_Variables2)
bin_sizes = 1×11
220 220 220 220 220 220 220 220 220 220 221
Vic
Vic 2024년 3월 7일
@Voss Thanks for these observations. @Manikanta Aditya & @Dyuman Joshi Thanks for your help. I haven't thought about the logical array. This is an elegant way to solve it.
Here is my current script.
Variables = rand(245,7);
Bin_numb = 11;
Bin_size = 1:floor(length(Variables)/Bin_numb):length(Variables);
if length(Variables)-Bin_size(end) <= 12
Bin_size(end) = length(Variables);
end
Bin_Variables2 = cell(1, length(Bin_size)-1);
for i = 1:length(Bin_size)-1
idx = true(length(Variables), 1);
idx(Bin_size(i):Bin_size(i+1)) = false;
Bin_Variables2{i} = Variables(idx, :);
end
for ii = 1:numel(Bin_Variables2)
fprintf('%1s%s\n',sprintf('Last row {%d}: ',ii),sprintf('%6.4g ',Bin_Variables2{ii}(end,:)));
end
bin_sizes = cellfun(@(x)size(x,1),Bin_Variables2)
length(Variables)-bin_sizes
Bin_size
Unrecognized function or variable 'Variables'.
Invalid expression. Check for missing or extra characters.
I forced a if condition to change Bin_size(end) = length(Variables) if size(Variables,1) is not a multiple of Bin_numb. Therefore, the last bin has floor(length(Variables)/Bin_numb) + mod(length(Variables),Bin_numb) rows (22+3) and I get this:
bin_sizes =
222 222 222 222 222 222 222 222 222 222 220
length(Variables)-bin_sizes =
23 23 23 23 23 23 23 23 23 23 25
It works.
As of the last row always being the same; it seems to be fine now but I still have some doubts about bin N-1 and its size.
Last row {1}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {2}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {3}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {4}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {5}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {6}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {7}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {8}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {9}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {10}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {11}: 0.1865 0.9516 0.07304 0.0887 0.697 0.9751 0.5142

댓글을 달려면 로그인하십시오.

채택된 답변

Manikanta Aditya
Manikanta Aditya 2024년 3월 4일
이동: Dyuman Joshi 2024년 3월 4일
Just check out this code snippet which I can propose to make the code more efficient by using logical indexing instead of a loop:
Variables = rand(245,57);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
Bin_Variables2 = cell(1, length(Bin_size)-1);
for i = 1:length(Bin_size)-1
idx = true(size(Variables, 1), 1);
idx(Bin_size(i):Bin_size(i+1)-1) = false;
Bin_Variables2{i} = Variables(idx, :);
end
In this code, 'idx' is a logical array that is true for the rows of Variables that you want to keep. This approach avoids the need to concatenate arrays, which can be slow in MATLAB because it involves memory allocation. Instead, you’re just creating a logical index and using it to select the rows you want.
  댓글 수: 2
Dyuman Joshi
Dyuman Joshi 2024년 3월 4일
편집: Dyuman Joshi 2024년 3월 4일
@Manikanta Aditya, This looks good, though I would suggest to use size(Bin_size,1) instead of length(Bin_size).
" ... by using logical indexing instead of a loop:"
You are still using a loop.
@Vic, an important part of the code above is Preallocation, which is a good programming practice in MATLAB resulting in improved code performance.
Manikanta Aditya
Manikanta Aditya 2024년 3월 4일
Thanks @Dyuman Joshi for the reply back. My bad I didn't see the statement about the loop.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Just for fun에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by