Merging tables using outerjoin doesn't work

Hi all,
I have two tables which I would like to merge/join. These tables have a series of shared columns and then one or more columns may exist in only one table.
I know I can use outerjoin to achieve what I want, like this:
a = rand(10,1);
b = rand(10,1);
c = rand(10,1);
d = rand(10,1);
e = rand(10,1);
f = repmat(mat2cell(a,10,1),10,1);
t1 = table(ones(10,1),ones(10,1),ones(10,1),a,b,c,d,e,f)
t2 = table(ones(10,1),ones(10,1),ones(10,1).*2,a,b,c)
t3 = outerjoin(t1,t2,'MergeKeys',true)
The problem is that this doesn't work on the tables I am using and I'm not sure why. I have attached two tables here as a sample. They have the exact same columns, but different rows, so the task is actually simpler than the above example. However, when I use outerjoin it makes new variables named after the tables instead of combining them vertically, and the resulting table has 2 rows but twice as many columns as the inputs:
t3 = outerjoin(t1,t2,'Keys',[1:3],'MergeKeys',true)
Is there some feature of these tables that I'm missing which prevents them from being joined in the way I want?
Thanks for any help,
R.

댓글 수: 5

I don't think you want to join, which by definition combines two table horizontally. Provide a short clear example of what you have and what you want to end up with.
Hi Peter,
Here is a more replicable example. This code:
a = rand(3,1);
b = rand(3,1);
c = rand(3,1);
d = rand(3,1);
e = rand(3,1);
f = repmat(mat2cell(a,3,1),3,1);
t1 = table(ones(3,1),a,b,c)
t2 = table(ones(3,1).*2,a,b)
t3 = outerjoin(t1,t2,'MergeKeys',true)
Produces this output:
t1 =
3×4 table
Var1 a b c
____ _________________ _________________ _________________
1 0.212268049890771 0.144940183355874 0.786990424482484
1 0.51189873810689 0.628664443166539 0.515958765112436
1 0.650024454445417 0.441312076439047 0.343697121302907
t2 =
3×3 table
Var1 a b
____ _________________ _________________
2 0.212268049890771 0.144940183355874
2 0.51189873810689 0.628664443166539
2 0.650024454445417 0.441312076439047
t3 =
6×4 table
Var1 a b c
____ _________________ _________________ _________________
1 0.212268049890771 0.144940183355874 0.786990424482484
1 0.51189873810689 0.628664443166539 0.515958765112436
1 0.650024454445417 0.441312076439047 0.343697121302907
2 0.212268049890771 0.144940183355874 NaN
2 0.51189873810689 0.628664443166539 NaN
2 0.650024454445417 0.441312076439047 NaN
So table t3 is the vertical concatenation of t1 and t2, but where one of the tables was missing a column (column c in t2), this column has been filled with NaNs.
I'm trying to do the same thing with two tables containing more complex/varied data, but outerjoin works differently. For instance:
a = rand(3,1);
b = repmat({a},3,1);
c = repmat({'hello'},3,1);
d = repmat({rand(10,10)},3,1);
t1 = table(ones(3,1),a,b,c,d)
t2 = table(ones(3,1).*2,a,b)
t3 = outerjoin(t1,t2,'Keys',1,'MergeKeys',true)
produces this output:
t1 =
3×5 table
Var1 a b c d
____ _________________ ____________ _______ ______________
1 0.749630891182043 [3×1 double] 'hello' [10×10 double]
1 0.175095755248745 [3×1 double] 'hello' [10×10 double]
1 0.172219045349798 [3×1 double] 'hello' [10×10 double]
t2 =
3×3 table
Var1 a b
____ _________________ ____________
2 0.749630891182043 [3×1 double]
2 0.175095755248745 [3×1 double]
2 0.172219045349798 [3×1 double]
t3 =
6×7 table
Var1 a_t1 b_t1 c d a_t2 b_t2
____ _________________ ____________ _______ ______________ _________________ ____________
1 0.749630891182043 [3×1 double] 'hello' [10×10 double] NaN []
1 0.175095755248745 [3×1 double] 'hello' [10×10 double] NaN []
1 0.172219045349798 [3×1 double] 'hello' [10×10 double] NaN []
2 NaN [] '' [] 0.749630891182043 [3×1 double]
2 NaN [] '' [] 0.175095755248745 [3×1 double]
2 NaN [] '' [] 0.172219045349798 [3×1 double]
So now t3 is the vertical concatenation of t1 and t2, but some of the columns are duplicated and renamed instead of being filled with NaNs etc. For instance, in the first example we had one column a in the final table, but in the second example we have two columns, a_t1 and a_t2.
I'm guessing this is because the columns contain different data types, and Matlab can't know what to fill the empty spaces with? but that seems really weird because Matlab has correctly filled the empty spaces (i.e. bottom half of a_t1).
How can I join the tables in the second example to have non-duplicated columns as in the first example?
Thanks for any help,
R.
outerjoin by definition combines table horizontally. You need to vertically concatenate. I guess your issue is that one table has more variables than the other. Add the missing variables, then vertcat.
That's what I am doing now, I just made a small utility to fill the missing columns as I go. Then I can just concatenate them at the end.
But how come the first example works and does what I want, but the second doesn't? Is it really just the type of data contained in the tables? I would like to know for the future.
I also don't understand why there is not a function that does do what I want specifically - it seems that Matlab can determine the correct data to fill the excess space with, so it should be able to vertically concatenate based on columns, or at least it seems trivial to get to that from outerjoin.
Thanks,
R.
Peter Perkins
Peter Perkins 2019년 4월 18일
편집: Peter Perkins 2019년 4월 18일
Again, outerjoin is not intended to do vertical concatenation. It matches up rows by key values and horizontally concatenates. It is not the tool you are looking for. I encourage you to read the documentation for all three join functions.
It's possible to hack outerjoin to do vertical concatenation in some sense, and you have done that either intentionally or unintentionally, I can't tell. The reason why your first example "works" is because you don't specify keys, so outerjoin uses all the matching var names, and your a and b vars happen to have the same values, and you merge keys. It happens to look like the right answer; that is completely by happenstance.
Your second example specifies the one key, and so outerjoin will horzcat the remianing vars: a,b,c,d from the first with a,b from the second. Duplicate names, so it adds suffixes. That's actually closer to a typical use of outerjoin, except that none of your key values match, and if no keys match, there's not any point in joining.
To do what you want, either add vars to make the two tables vertcat-able, or assign one off the end of the other.
>> t1 = array2table(rand(2,3))
t1 =
2×3 table
Var1 Var2 Var3
_______ _______ _______
0.81472 0.12699 0.63236
0.90579 0.91338 0.09754
>> t2 = array2table(rand(2,4))
t2 =
2×4 table
Var1 Var2 Var3 Var4
_______ _______ _______ _______
0.2785 0.95751 0.15761 0.95717
0.54688 0.96489 0.97059 0.48538
>> t2(3:4,t1.Properties.VariableNames) = t1
Warning: The assignment added rows to the table, but did not assign values to all of the table's existing variables. Those variables are
extended with rows containing default values.
> In tabular/subsasgnParens (line 427)
In tabular/subsasgn (line 64)
t2 =
4×4 table
Var1 Var2 Var3 Var4
_______ _______ _______ _______
0.2785 0.95751 0.15761 0.95717
0.54688 0.96489 0.97059 0.48538
0.81472 0.12699 0.63236 0
0.90579 0.91338 0.09754 0
You will notice that that has padded Var4 with zeros; as with all numeric arrays anywhere in MATLAB, they are extended with zeros. So you'd have to assign NaN to those elements.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

도움말 센터File Exchange에서 Tables에 대해 자세히 알아보기

질문:

2019년 4월 12일

편집:

2019년 4월 18일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by