Outerjoin isn't matching the same values in two different tables

조회 수: 8 (최근 30일)
Ariana
Ariana 2024년 7월 3일
댓글: Voss 2024년 7월 3일
Hi everyone! I'm having a problem with some datetime tables, and using outerjoin.
So I have this one matrix, that has all my data. The first column is the datetime, and the next columns are data matching each time (I haven't included the data in the attached files, just the first column). The problem I'm having is that sometimes the datetime skips time - for instance, one row will be July 10 2023 18:00:00, and then the next row will be July 12 2023 0:00:00. This is just due to the nature of the data collection, but since I want to run some statistical analyses on my data, I want the timestep for the date and time to be even, even if that means between those two time steps I will have NaNs in all the data columns (since there was no data collected during that time).
So to do this I made a time vector that had an even time step from the beginning and the end of my data collection. I then tried to use outerjoin to join this dataset with the datetime column of my original dataset. However, I'm finding that they're not matching up even when the date and time are the same in each dataset!
Here's what I mean:
This is the output table from using the outerjoin function. On the left is my time vector with even time steps, and on the right is my data table that is messier with skips in date and time. As you can see, there are times where they match up (for example, row 574), but there are also times where they don't (rows 577 and 578, where the values are the SAME but they're not recognized as such).
How can I fix this? I made sure there are no repeats in either table I'm working with. I've also tried using outerjoin when the datetime is in the matlab number format, but I still have the same problem. I'm open to any solution, even if it doesn't use outerjoin as a function. I just want an even timestep for my data.
I've attached my datasets: timevecTable is the time vector with even time steps, and v2Table is my dataset that skips datetimes. The resulting table using outerjoin is also included, and that's named timeJoined.
Thank you!

채택된 답변

Voss
Voss 2024년 7월 3일
편집: Voss 2024년 7월 3일
As for why outerjoin() doesn't appear to combine data when the datetimes are the same, the fact is that some datetimes that appear to be the same are actually different. There are small fractions of a second in one that are not in the other, which are not apparent due to the Format showing only the seconds (no fractions).
For example the first two datetimes in timevecTable appear to be the same as the first two in v2Table:
timevecTable = load('timevecTable.mat').timevecTable;
v2Table = load('v2Table.mat').v2Table;
timevecTable.datetime([1 2])
ans = 2x1 datetime array
10-Jul-2023 22:12:00 10-Jul-2023 22:14:00
v2Table.datetime([1 2])
ans = 2x1 datetime array
10-Jul-2023 22:12:00 10-Jul-2023 22:14:00
but they are not equal:
isequal(timevecTable.datetime([1 2]),v2Table.datetime([1 2]))
ans = logical
0
In fact v2Table.datetime(2) has a small fraction of a second, which timevecTable.datetime(2) does not. You can see that by adjusting the Format of each:
fmt1 = timevecTable.datetime.Format;
fmt2 = v2Table.datetime.Format;
timevecTable.datetime.Format = [fmt1 '.SSSSSSSSS'];
v2Table.datetime.Format = [fmt2 '.SSSSSSSSS'];
And check the values again:
timevecTable.datetime([1 2])
ans = 2x1 datetime array
10-Jul-2023 22:12:00.000000000 10-Jul-2023 22:14:00.000000000
v2Table.datetime([1 2])
ans = 2x1 datetime array
10-Jul-2023 22:12:00.000000000 10-Jul-2023 22:14:00.000006835
You can see there that v2Table.datetime(2) has an additional 0.000006835 (or so) seconds relative to timevecTable.datetime(2).
To fix this, you can shift each datetime to the start of the nearest second:
timevecTable.datetime = dateshift(timevecTable.datetime,'start','second','nearest');
v2Table.datetime = dateshift(v2Table.datetime,'start','second','nearest');
[and you can change the Formats back (they are just for display/illustration anyway and don't change the underlying data)]
timevecTable.datetime.Format = fmt1;
v2Table.datetime.Format = fmt2;
and then the outerjoin() result should be more like what you expect:
timeJoined = outerjoin(timevecTable,v2Table)
timeJoined = 22544x2 table
datetime_timevecTable datetime_v2Table _____________________ ____________________ NaT 08-Jul-2023 08:22:00 NaT 08-Jul-2023 08:24:00 NaT 08-Jul-2023 08:26:00 NaT 08-Jul-2023 08:28:00 NaT 08-Jul-2023 08:30:00 NaT 08-Jul-2023 08:32:00 NaT 08-Jul-2023 08:34:00 NaT 08-Jul-2023 08:36:00 NaT 08-Jul-2023 08:38:00 NaT 08-Jul-2023 08:40:00 NaT 08-Jul-2023 08:42:00 NaT 08-Jul-2023 08:44:00 NaT 08-Jul-2023 08:46:00 NaT 08-Jul-2023 08:48:00 NaT 08-Jul-2023 08:50:00 NaT 08-Jul-2023 08:52:00
Notice it's ~22000 rows instead of the ~40000 rows that you had in your outerjoined table (in timeJoined.mat) originally.
  댓글 수: 2
Ariana
Ariana 2024년 7월 3일
Oh my gosh, thank you so much! I really appreciate this!

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Cris LaPierre
Cris LaPierre 2024년 7월 3일
Not sure how you created the time steps, but datetimes apply a display format to the actual data. Perhaps the actual values do differ by some small amount.
Since you have a timetable, I'd try using retime. Your data was not in time order, so I use sortrows to put it in ascending time order.
load v2Table.mat v2Table
v2Table.Data = rand(height(v2Table),1);
v2Table = table2timetable(sortrows(v2Table))
v2Table = 21953x1 timetable
datetime Data ____________________ ________ 08-Jul-2023 08:22:00 0.60353 08-Jul-2023 08:24:00 0.039054 08-Jul-2023 08:26:00 0.23764 08-Jul-2023 08:28:00 0.69194 08-Jul-2023 08:30:00 0.38519 08-Jul-2023 08:32:00 0.66399 08-Jul-2023 08:33:59 0.6033 08-Jul-2023 08:35:59 0.92594 08-Jul-2023 08:37:59 0.30641 08-Jul-2023 08:39:59 0.34128 08-Jul-2023 08:42:00 0.82971 08-Jul-2023 08:44:00 0.27009 08-Jul-2023 08:46:00 0.29427 08-Jul-2023 08:48:00 0.87164 08-Jul-2023 08:50:00 0.00811 08-Jul-2023 08:51:59 0.18701
v2Table_retimed = retime(v2Table,'regular','fillwithmissing','TimeStep',minutes(2))
v2Table_retimed = 23992x1 timetable
datetime Data ____________________ ________ 08-Jul-2023 08:22:00 0.60353 08-Jul-2023 08:24:00 0.039054 08-Jul-2023 08:26:00 NaN 08-Jul-2023 08:28:00 NaN 08-Jul-2023 08:30:00 NaN 08-Jul-2023 08:32:00 NaN 08-Jul-2023 08:34:00 NaN 08-Jul-2023 08:36:00 NaN 08-Jul-2023 08:38:00 NaN 08-Jul-2023 08:40:00 NaN 08-Jul-2023 08:42:00 0.82971 08-Jul-2023 08:44:00 NaN 08-Jul-2023 08:46:00 NaN 08-Jul-2023 08:48:00 NaN 08-Jul-2023 08:50:00 NaN 08-Jul-2023 08:52:00 NaN

Taylor
Taylor 2024년 7월 3일
MATLAB seems to be recognizing those datetimes (indexed at 577 and 578) as unique values
load('timevecTable.mat')
load('v2Table.mat')
t = outerjoin(timevecTable, v2Table, "MergeKeys",true)
t = 40741x1 table
datetime ____________________ 08-Jul-2023 08:22:00 08-Jul-2023 08:24:00 08-Jul-2023 08:26:00 08-Jul-2023 08:28:00 08-Jul-2023 08:30:00 08-Jul-2023 08:32:00 08-Jul-2023 08:33:59 08-Jul-2023 08:35:59 08-Jul-2023 08:37:59 08-Jul-2023 08:39:59 08-Jul-2023 08:42:00 08-Jul-2023 08:44:00 08-Jul-2023 08:46:00 08-Jul-2023 08:48:00 08-Jul-2023 08:50:00 08-Jul-2023 08:51:59
t = table2array(t);
isequal(t(577), t(578))
ans = logical
0
You can just convert everything to a datetime array, then a string array, then back to a datetime array and it will recognize them as equal.
t = string(t);
t = datetime(t)
t = 40741x1 datetime array
08-Jul-2023 08:22:00 08-Jul-2023 08:24:00 08-Jul-2023 08:26:00 08-Jul-2023 08:28:00 08-Jul-2023 08:30:00 08-Jul-2023 08:32:00 08-Jul-2023 08:33:59 08-Jul-2023 08:35:59 08-Jul-2023 08:37:59 08-Jul-2023 08:39:59 08-Jul-2023 08:42:00 08-Jul-2023 08:44:00 08-Jul-2023 08:46:00 08-Jul-2023 08:48:00 08-Jul-2023 08:50:00 08-Jul-2023 08:51:59 08-Jul-2023 08:53:59 08-Jul-2023 08:55:59 08-Jul-2023 08:57:59 08-Jul-2023 09:00:00 08-Jul-2023 09:02:00 08-Jul-2023 09:04:00 08-Jul-2023 09:06:00 09-Jul-2023 07:23:59 09-Jul-2023 07:25:59 09-Jul-2023 07:27:59 09-Jul-2023 07:30:00 09-Jul-2023 07:32:00 09-Jul-2023 07:34:00 09-Jul-2023 07:36:00
isequal(t(577), t(578))
ans = logical
1
Then use unique to filter out duplicate values.
t = unique(t);
size(t)
ans = 1x2
31163 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by