Analyzing Sections of Table Based on One Variable

Question

0 개 추천

Hello,

I have a large table (we'll say something like 8 x 100, but it will more likely having over 10,000 rows).

The last column represents "time in a day", and the 100 rows represents several days worth of data.

Everytime the day hits midnight, the last column has a value of 0.

It's worth noting that the last column isn't quite a single 24-hr time frame, so the last column doesn't always increase throughout the day before it drops back to time 0.

However, there are not an equal amount of rows for each day.

So far example, Day 1 might have 18 rows, Day 2 will have 35 rows, Day 3 will have 40 rows, and Day 4 will have 7 rows.

Each Day will always begin with a value of 0 for the last column.

The total number of days and how many rows per day are both never constant (i.e. it will vary).

I don't necessarily want to split the larger table into 4 smaller tables (after reading other forum posts it seems this can lead to many bugs and errors), although this might accomplish what I'm looking for.

Rather, I want to be able to (A) compare the first row of Day 1 and with first row of Day 2, and compare Day 1 with Day 3, etc.

Then (B) determine if the first rows of Days 1 and 2 are similar enough to not worry about Day 2's data.

I have a script already that accomplished step B (previously I was arbitralilty splitting the larger table).

But since know I want each section from the larger table to have a unique number of rows, I'm having difficulty accomplishing this.

In pseudocode, I was starting to do something like:

for i=1:N
    if LastCol(i)==0
        % Save all data between point i and point right before LastCol(i)==0 again
    end
end

One thing that I can have known prior to this is how many days will be in the large table.

So for example, in the above example, I will know that there are 4 days, which corresponds to 4 zeros in the last column.

If my request seems confusing or convoluted, please let me know and I can explain further or take it down.

In Short: My main issue is how I split the table up in unique sizes.

Thanks.

Edit:

I thought of a couple of for loops that sort of help my issue.

I have two matrices: One with four datapoints corresponding the running time of the first day. And one that is the whole table.

for i=1:Length_Table
    for j=1:Num_Days
        if Big_Table.A(i)==Day_Matrix(j)
            % Not Sure
        else
        end
    end
end

where Big_Table.A column corresponds to the overall running time (doesn't reset after each day).

If at any time for the Big Table of Data equals a time when we know the day resets, then that's where I want to begin a "new section" and end that section at the point before the it happens again.

I'm not sure how to do this task, however.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Shaunak 2025년 2월 17일

편집: Shaunak 2025년 2월 17일

MATLAB Online에서 열기

0 개 추천

Hi Jon,

You can find and store the exact indices where the day resets, using the “find” function in MATLAB.

Here, the following function generates the array ‘dayStartIndices' of the indices of rows having the value of the last column as 0 and performs the comparison with the similarity-threshold, given as input parameter:

function compareFirstRows(data, similarityThreshold)
% PART A of the question
dayStartIndices = find(data(:, end) == 0);
% Extending the array for easy traversal
dayStartIndices = [dayStartIndices; size(data, 1) + 1];
% PART B of the question
for i = 1:length(dayStartIndices) - 1
    % Get the first row of the current day
    currentDayFirstRow = data(dayStartIndices(i), :);
    
    for j = i+1:length(dayStartIndices) - 1
        % Get the first row of the next day
        nextDayFirstRow = data(dayStartIndices(j), :);
        
        % NOTE : Replace this with the required similarity logic
        distance = norm(currentDayFirstRow - nextDayFirstRow);
        
        % Determine if the first rows are similar
        if distance < similarityThreshold
            fprintf('Day %d and Day %d are similar based on their first rows.\n', i, j);
        end
    end
end
end

You can generate some sample data for the above function by calling this function with the required number of rows:

function data = test_generate(numRows)
timeColumn = zeros(numRows, 1);
maxTime = 23;
currentTime = 0;
numDays = 1;
dayStartIndices = 1;
data = rand(100, 8);
for i = 2:numRows
    stepSize = randi([1, 5]);
    currentTime = currentTime + stepSize;
    
    if currentTime > maxTime
        currentTime = 0;
        numDays = numDays + 1;
        dayStartIndices = [dayStartIndices, i];
    end
    
    timeColumn(i) = currentTime;
end
dayStartIndices = [dayStartIndices, numRows + 1];
for j = 1:length(dayStartIndices) - 1
    startIdx = dayStartIndices(j);
    endIdx = dayStartIndices(j + 1) - 1;
    timeColumn(startIdx:endIdx) = timeColumn(startIdx - 1 + randperm(endIdx - startIdx + 1));
end
data(:, end) = timeColumn;
end

Refer to the following MathWorks Documentation of the “find” function for more information:

https://www.mathworks.com/help/releases/R2021b/matlab/ref/find.html

Hope this helps!

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Analyzing Sections of Table Based on One Variable

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Analyzing Sections of Table Based on One Variable

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기