I have a problem with the unique-function and I can't find the reason why.
I import a table vom excel and afterwards I want to use [table,~] = unique(table,'rows');
It worked until recently. The only change I remember I did, was that I adjusted the timeformat.
Now I have doubled rows in my table, even after using [table,~] = unique(table,'rows');.
table = [];
for i=1:length(files_table_data)
path_table = fullfile(files_table_data(i).name);
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');

댓글 수: 4

the cyclist
the cyclist 2021년 4월 3일
Doing the following will help us help you:
  • Upload the table in a MAT file, so we can work with it. (FYI, it's generally a bad idea to give a variable the same name as a MATLAB command.)
  • Tell us what you mean by "it's not working as expected". Be specific.
Thank you for your tips. I'm not sure, how to upload the table as a .mat file, because it's sensitive data. The import of the table works, but it has still repeating rows/rows with the same values/data.
An extraction from the table (output) would be:
Name Date Number
"378478409220" 2021-03-22 23:24 1
"378478409220" 2021-03-22 23:24 2
"378478409220" 2021-03-22 23:24 3
"378478409220" 2021-03-22 23:24 4
"378478409295" 2020-12-16 11:28 NaN
"378478409295" 2020-12-16 11:28 4
"378478409295" 2020-12-16 11:28 4
"378478409295" 2020-12-17 04:17 1
"378478409295" 2020-12-17 04:17 1
The last four rows are repeating. I would like to keep only one row per combination of values.
the cyclist
the cyclist 2021년 4월 3일
The last two rows -- as you have written them -- are obviously identical.
If I create a table using exactly what you have written, they will be identical, and the unique command will get rid of one of them. (Unless there is a bug in the unique command, which I doubt.)
This is why uploading the actual data, not just typing what it looks like, is so important to replicating the problem.
Can you create a very small segment of the data that is not sensitive to upload? Perhaps just a few rows, and only the relevant columns? Then test that segment to make sure it still exhibits the bug? Then upload everything we need to replicate that bug.
I think with the Code and the data from the excel sheet it should be visible... You might need to adjust the data_path, so the code can find the Test.xlsx.
I needed to change the original Time-Format, because I compare this data afterwards with different data from other sources, that have different time-formats.
Thank you!
function test = import_test()
files_test = dir('../data/not_anonymized/Test*');
%% read table
table = [];
for i=1:length(files_test)
path_table = fullfile('..','data','not_anonymized', files_test(i).name);
%fprintf('Starting file %d/%d: %s\n',i,length(files_test),files_test(i).name)
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');
end

댓글을 달려면 로그인하십시오.

 채택된 답변

Cris LaPierre
Cris LaPierre 2021년 4월 3일
First, do not name your table variable table. That overwrites MATLAB's function for constructing tables.
When I look at your excel file, I see that times record milliseconds to three decimal places. With times, you can modify how it is displayed without losing this level of detail. You have set your display format to 'yyyy-MM-dd HH:mm' but the actual data is still saved as 'yyyy-MM-dd HH:mm:ss.SSS XXX'.
Therefore, the unique command is working just fine. The times are not identical when looking at the actual data.
If you only want to compare your times to the minute, then you must actually modify the data to only be minutes. Perhaps something like this.
% Create a time with seconds and milliseconds, but set display format to
% HH:mm
d=datetime(2021,01,02,3,4,5,123,'Format','yyyy-MM-dd HH:mm')
d = datetime
2021-01-02 03:04
% Change display format so seconds and milliseconds are shown
d.Format = 'dd-MMM-yyyy HH:mm:ss.SSS'
d = datetime
02-Jan-2021 03:04:05.123
% set second and milliseconds to 0.
d.Second = 0
d = datetime
02-Jan-2021 03:04:00.000
For a datetime variable in a table, use dot notation: tbl.Date.Second = 0

댓글 수: 5

Thank you! The actual table is not called table, that was for the code I posted here. But it's definetly good to know! I renamed table to test.
I tried now to add test.Date.Second = 0; below my import and before the unique-call. It is still not changing the output, my rows are still doubled. I am not sure, whether I am still doing a mistake or what the problem is...
...
test = [test; readtable(path_test,opts)];
end
test.Date.Second=0;
[test,~] = unique(test,'rows');
Ariane Fidelia Wettig
Ariane Fidelia Wettig 2021년 4월 5일
편집: Ariane Fidelia Wettig 2021년 4월 5일
"378478409295" 2021-02-24 04:32:00.000 +01:00 2
"378478409295" 2021-02-24 04:32:00.000 +01:00 2
This is part of the Output, even if I don't change the DatetimeFormat.
In my actual table I have 3 rows for numbers, sometimes there are NaN-Values (In the excel files this entries are empty). Can the problem be connected to this kind of entries?
Cris LaPierre
Cris LaPierre 2021년 4월 5일
편집: Cris LaPierre 2021년 4월 5일
Without your data, or at least an example to look at, I have to believe you are not inspecting your data carefully enough to spot the difference.
Here's a simplified version using the test data set you shared.
path_table = "Ariane_Test.xlsx";
opts = detectImportOptions(path_table);%excel
opts = setvartype(opts,'Date','datetime');%datetime
opts = setvaropts(opts,'Date','InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,'Date','DatetimeFormat','yyyy-MM-dd HH:mm.ss.SSS');
opts = setvartype(opts,'Name','string');
test = readtable(path_table,opts);
test.Date.Second = 0
test = 63×3 table
Name Date Number ______________ _______________________ ______ "338778418867" 2020-10-05 18:34.00.000 3 "338778418867" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 1 "338778419063" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 3 "338778419063" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 4 "338045763939" 2020-10-05 17:50.00.000 1 "378045659745" 2020-10-05 05:00.00.000 2 "378045659745" 2020-10-05 05:00.00.000 3 "378045659745" 2020-10-05 05:00.00.000 4 "378045659760" 2020-10-05 05:00.00.000 1 "378045659760" 2020-10-05 05:00.00.000 2
testU = unique(test,'rows');
tail(testU)
ans = 8×3 table
Name Date Number ______________ _______________________ ______ "378045659760" 2020-10-05 05:00.00.000 3 "378045659760" 2020-10-05 05:00.00.000 4 "378049751076" 2020-10-04 23:44.00.000 1 "378049751076" 2020-10-04 23:44.00.000 2 "378049751076" 2020-10-04 23:44.00.000 3 "378049751076" 2020-10-04 23:44.00.000 4 "378049751076" 2020-10-04 23:44.00.000 5 "378049751076" 2020-10-04 23:44.00.000 6
The last 6 have the same Name and Date, but Number is different.
Thank you very much!
I checked my data again, googled and found out, that unique is indeed seeing NaN as NaN~=NaN. Therefore I added:
test.Number(isnan(test.Number),:)=inf(size(test.Number(isnan(test.Number))));
[test,~] = unique(test,'rows');
test.Number(isinf(test.Number),:)=NaN(size(test.Number(isinf(test.Number))));
to my code and now it works! :)
Thank you very much for your help!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

제품

릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by