Analyzing Sections of Table Based on One Variable
조회 수: 4 (최근 30일)
이전 댓글 표시
Hello,
I have a large table (we'll say something like 8 x 100, but it will more likely having over 10,000 rows).
The last column represents "time in a day", and the 100 rows represents several days worth of data.
Everytime the day hits midnight, the last column has a value of 0.
It's worth noting that the last column isn't quite a single 24-hr time frame, so the last column doesn't always increase throughout the day before it drops back to time 0.
However, there are not an equal amount of rows for each day.
So far example, Day 1 might have 18 rows, Day 2 will have 35 rows, Day 3 will have 40 rows, and Day 4 will have 7 rows.
Each Day will always begin with a value of 0 for the last column.
The total number of days and how many rows per day are both never constant (i.e. it will vary).
I don't necessarily want to split the larger table into 4 smaller tables (after reading other forum posts it seems this can lead to many bugs and errors), although this might accomplish what I'm looking for.
Rather, I want to be able to (A) compare the first row of Day 1 and with first row of Day 2, and compare Day 1 with Day 3, etc.
Then (B) determine if the first rows of Days 1 and 2 are similar enough to not worry about Day 2's data.
I have a script already that accomplished step B (previously I was arbitralilty splitting the larger table).
But since know I want each section from the larger table to have a unique number of rows, I'm having difficulty accomplishing this.
In pseudocode, I was starting to do something like:
for i=1:N
if LastCol(i)==0
% Save all data between point i and point right before LastCol(i)==0 again
end
end
One thing that I can have known prior to this is how many days will be in the large table.
So for example, in the above example, I will know that there are 4 days, which corresponds to 4 zeros in the last column.
If my request seems confusing or convoluted, please let me know and I can explain further or take it down.
In Short: My main issue is how I split the table up in unique sizes.
Thanks.
Edit:
I thought of a couple of for loops that sort of help my issue.
I have two matrices: One with four datapoints corresponding the running time of the first day. And one that is the whole table.
for i=1:Length_Table
for j=1:Num_Days
if Big_Table.A(i)==Day_Matrix(j)
% Not Sure
else
end
end
end
where Big_Table.A column corresponds to the overall running time (doesn't reset after each day).
If at any time for the Big Table of Data equals a time when we know the day resets, then that's where I want to begin a "new section" and end that section at the point before the it happens again.
I'm not sure how to do this task, however.
댓글 수: 0
답변 (1개)
Shaunak
2025년 2월 17일
편집: Shaunak
2025년 2월 17일
Hi Jon,
You can find and store the exact indices where the day resets, using the “find” function in MATLAB.
Here, the following function generates the array ‘dayStartIndices' of the indices of rows having the value of the last column as 0 and performs the comparison with the similarity-threshold, given as input parameter:
function compareFirstRows(data, similarityThreshold)
% PART A of the question
dayStartIndices = find(data(:, end) == 0);
% Extending the array for easy traversal
dayStartIndices = [dayStartIndices; size(data, 1) + 1];
% PART B of the question
for i = 1:length(dayStartIndices) - 1
% Get the first row of the current day
currentDayFirstRow = data(dayStartIndices(i), :);
for j = i+1:length(dayStartIndices) - 1
% Get the first row of the next day
nextDayFirstRow = data(dayStartIndices(j), :);
% NOTE : Replace this with the required similarity logic
distance = norm(currentDayFirstRow - nextDayFirstRow);
% Determine if the first rows are similar
if distance < similarityThreshold
fprintf('Day %d and Day %d are similar based on their first rows.\n', i, j);
end
end
end
end
You can generate some sample data for the above function by calling this function with the required number of rows:
function data = test_generate(numRows)
timeColumn = zeros(numRows, 1);
maxTime = 23;
currentTime = 0;
numDays = 1;
dayStartIndices = 1;
data = rand(100, 8);
for i = 2:numRows
stepSize = randi([1, 5]);
currentTime = currentTime + stepSize;
if currentTime > maxTime
currentTime = 0;
numDays = numDays + 1;
dayStartIndices = [dayStartIndices, i];
end
timeColumn(i) = currentTime;
end
dayStartIndices = [dayStartIndices, numRows + 1];
for j = 1:length(dayStartIndices) - 1
startIdx = dayStartIndices(j);
endIdx = dayStartIndices(j + 1) - 1;
timeColumn(startIdx:endIdx) = timeColumn(startIdx - 1 + randperm(endIdx - startIdx + 1));
end
data(:, end) = timeColumn;
end
Refer to the following MathWorks Documentation of the “find” function for more information:
Hope this helps!
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!