Inner join() Producing Duplicate Entries

조회 수: 4 (최근 30일)
Paul Quelet
Paul Quelet 2014년 9월 1일
편집: Oleg Komarov 2014년 9월 3일
I have several large times series of meteorological variables from the same measurement tower. I wanted to compare data values as the exact same measurement points in time. I setup serial date numbers and values into dataset() arrays similar to the following post:
I followed Message 5 to code something like C = join(Dataset1, Dataset2, 'Type', 'inner'). The results looked good at first input dates like the following:
[DS1.Time DS2.Time] =
01-Jan-2012 00:07:22 01-Jan-2012 00:07:22
01-Jan-2012 00:17:22 01-Jan-2012 00:17:22
01-Jan-2012 00:37:22 01-Jan-2012 00:37:22
01-Jan-2012 00:47:22 01-Jan-2012 00:47:22
01-Jan-2012 00:57:22 01-Jan-2012 00:57:22
01-Jan-2012 01:47:22 01-Jan-2012 01:07:22
01-Jan-2012 01:57:22 01-Jan-2012 01:27:22
01-Jan-2012 02:07:22 01-Jan-2012 01:47:22
01-Jan-2012 02:17:22 01-Jan-2012 01:57:22
01-Jan-2012 02:27:22 01-Jan-2012 02:07:22 ...
so that the resulting dates (with data) using C = join(DS1,DS2,'Type','inner') would be:
C.Time =
01-Jan-2012 00:07:22
01-Jan-2012 00:17:22
01-Jan-2012 00:37:22
01-Jan-2012 00:47:22
01-Jan-2012 00:57:22
01-Jan-2012 01:47:22
01-Jan-2012 01:57:22
01-Jan-2012 02:07:22
01-Jan-2012 02:17:22
01-Jan-2012 02:27:22 ...
The problems started when I would take the output C to perform more time series merging. From and inner join being like and intersection of the times in two datasets, it stands to reason that length(C) <= length(DS1) and length(C) <= length(DS2). This became not the case using Cnew = join(C,DS4,'Type','inner'). Checking the times on the ends looked fine, but I finally discovered repeated data rows in the middle of the resulting dataset like:
Cnew.Time =
...
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11 ...
After much investigation, the only way I found to fix this problem after an inner join was to use the unique() function in the following way:
Cnew = join( C, DS4, 'key', 'Time', 'Type', 'inner', 'MergeKeys', true ) ;
CnewUnique = unique(Cnew , 'Time') ;
This would finally produce the output I was looking for:
CnewUnique.Time = ...
02-Nov-2012 09:00:11
02-Nov-2012 09:10:11
02-Nov-2012 09:20:11
02-Nov-2012 09:30:11
02-Nov-2012 09:40:11
02-Nov-2012 09:50:11
02-Nov-2012 10:00:11
02-Nov-2012 10:10:11
02-Nov-2012 10:20:11 ...
This took many hours to figure out so I wanted to ask the following question(s):
  1. Why was the join(...,'inner',...) not working the way I expected, as it did before?
  2. Is there a better way to match up the times from several time series? (I did not have success with the synchronize function either for an "intersection" of the times.)
  3. Has anyone else had a similar problem? Is Matlab possibly having a "bug"-type behavior here?
Any insights are appreciated. Thank you for contributing this this post.
  댓글 수: 4
per isakson
per isakson 2014년 9월 2일
편집: per isakson 2014년 9월 2일
Disclaimer: I have not worked with dataset of the Stat Toolbox. But, I have worked with time series, meteorological and others.
I looked at the code of join. It uses the function unique for comparison. unique cannot handle double well. So why isn't there a test in the code? At least warning would have been appropriate.
The documentation of join says: &nbsp C = join(A,B,keys) performsthe merge using the variables specified by keys as the key variables in both A and B. keys is a positive integer , a vector of positive integers, a variable name ,a cell array of variable names, or a logical vector .
My conclusion is that serial date numbers (double) cannot be used as keys in join.
I have ended up using "serial second number" stored as uint32 to avoid problem like this.
Oleg Komarov
Oleg Komarov 2014년 9월 3일
편집: Oleg Komarov 2014년 9월 3일
What if you try to use table() instead of dataset? The table.join() has no restriction on the type of variable that you can use as keys.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by