matlab 'unique' is skipping rows with data

Question

0 개 추천

Myfile.zip

I'm trying to put parse a large file to only contain data that has changed. First column is continuous time so I exclude that column. Next 3 columns are what I need to parse...with unique values. When reviewing final file (I called 12Dec2023Events.csv) with a file I parsed by hand, I noticed not all data was included.

346:17:37:18.6839209 9 23 3E

346:17:37:18.6939210 2 E 3E (data not identified by unique)

unique did not pick up the change until

346:17:37:52.2139008 2 E 3E

346:17:37:52.2239008 2 0 3E

t=readtable('Myfile.csv');

[~,ind]=unique(t(:,2:4),'stable');

t2=t(ind,:);

Please help the files are just to massive to parse by hand and having a file with data missing isn't a solution either

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Stephen23 2024년 1월 2일

편집: Stephen23 2024년 1월 2일

MATLAB Online에서 열기

1 개 추천

Myfile.zip

The basic problem is that your file is large, and by default READTABLE checks a limited number of rows** before deciding what data type each column has***. Your data file has very different data at the start of those columns than it does further down those columns, e.g. some of them contain mostly numeric data at the start... but in fact you want columns 2, 3, & 4 imported as text (because they all contain alphanumeric characters****).

So you need to tell READMATRIX that, e.g.:

unzip Myfile.zip
fnm = 'Myfile.csv';
obj = detectImportOptions(fnm);
obj = setvartype(obj,2:4,'string');
tbl = readtable(fnm,obj)
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
tbl = 520657×4 table
            UTC             MRC_TXID    MRC_CommandID    MRC_User_Defined
    ____________________    ________    _____________    ________________

    8321:16:37.994597698      "1A"           "0"               "8"       
    8321:16:38.004597698      "1A"           "0"               "8"       
    8321:16:38.014597698      "1A"           "0"               "8"       
    8321:16:38.024597698      "1A"           "0"               "8"       
    8321:16:38.034597698      "1A"           "0"               "8"       
    8321:16:38.044597698      "1A"           "0"               "8"       
    8321:16:38.054597599      "1A"           "0"               "8"       
    8321:16:38.064597698      "1A"           "0"               "8"       
    8321:16:38.074597698      "1A"           "0"               "8"       
    8321:16:38.084597599      "1A"           "0"               "8"       
    8321:16:38.094597599      "1A"           "0"               "8"       
    8321:16:38.104597599      "1A"           "0"               "8"       
    8321:16:38.114597599      "1A"           "0"               "8"       
    8321:16:38.124597599      "1A"           "0"               "8"       
    8321:16:38.134597599      "1A"           "0"               "8"       
    8321:16:38.144597599      "1A"           "0"               "8"       
[~,ind] = unique(tbl(:,2:4),'stable');
t2 = tbl(ind,:)
t2 = 235×4 table
            UTC             MRC_TXID    MRC_CommandID    MRC_User_Defined
    ____________________    ________    _____________    ________________

    8321:16:37.994597698      "1A"          "0"                "8"       
    8321:29:57.394167701      "1"           "E"                "3E"      
    8321:29:58.474167098      "1"           "23"               "3E"      
    8321:30:52.554138698      "2"           "E"                "3E"      
    8321:30:53.634138099      "2"           "23"               "3E"      
    8321:31:30.414119201      "5"           "23"               "3E"      
    8321:34:20.234026298      "6"           "23"               "3E"      
    8321:34:41.864013999      "A"           "23"               "3E"      
    8321:35:00.254003200      "9"           "23"               "3E"      
    8321:37:52.223900798      "2"           "0"                "3E"      
    8321:38:07.373891498      "2"           "24"               "3E"      
    8321:38:11.693888801      "2"           "12"               "3E"      
    8321:38:17.103885501      "2"           "31"               "3E"      
    8321:38:20.353883499      "2"           "1C"               "3E"      
    8321:38:30.083877399      "2"           "E"                "30"      
    8321:38:35.493874000      "2"           "E"                "34"      

And there is your "missing" data:

idx = all(t2{:,2:4}==["2","E","3E"],2);
t2(idx,:)
ans = 1×4 table
            UTC             MRC_TXID    MRC_CommandID    MRC_User_Defined
    ____________________    ________    _____________    ________________

    8321:30:52.554138698      "2"            "E"               "3E"      

Just to confirm, lets check its location in the imported table:

idy = ind(idx)
idy = 85331
tbl(idy,:)
ans = 1×4 table
            UTC             MRC_TXID    MRC_CommandID    MRC_User_Defined
    ____________________    ________    _____________    ________________

    8321:30:52.554138698      "2"            "E"               "3E"      

And checking that line in the original file (don't forget the header is also one line):

So far everything looks as expected.

"matlab 'unique' is skipping rows with data"

So far I don't see any problem with UNIQUE.

** Apparently fewer than 79825:

*** Because otherwise people complain that file importing takes too long. This is a good example of John Lydgate's aphorism about pleasing all people all of the time.

**** Look at your table t: numeric columns cannot contain alphabetic characters. That should be the big clue for you, that you need to modify the file importing.

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Stephen23 2024년 1월 2일

MATLAB Online에서 열기

Myfile.zip

In comparison, look at the columns of your table, what classes do columns 3 & 4 have? (hint: numeric).

When debugging always look at your data!

unzip Myfile.zip
t = readtable('Myfile.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
t = 520657×4 table
            UTC             MRC_TXID    MRC_CommandID    MRC_User_Defined
    ____________________    ________    _____________    ________________

    8321:16:37.994597698     {'1A'}           0                 8        
    8321:16:38.004597698     {'1A'}           0                 8        
    8321:16:38.014597698     {'1A'}           0                 8        
    8321:16:38.024597698     {'1A'}           0                 8        
    8321:16:38.034597698     {'1A'}           0                 8        
    8321:16:38.044597698     {'1A'}           0                 8        
    8321:16:38.054597599     {'1A'}           0                 8        
    8321:16:38.064597698     {'1A'}           0                 8        
    8321:16:38.074597698     {'1A'}           0                 8        
    8321:16:38.084597599     {'1A'}           0                 8        
    8321:16:38.094597599     {'1A'}           0                 8        
    8321:16:38.104597599     {'1A'}           0                 8        
    8321:16:38.114597599     {'1A'}           0                 8        
    8321:16:38.124597599     {'1A'}           0                 8        
    8321:16:38.134597599     {'1A'}           0                 8        
    8321:16:38.144597599     {'1A'}           0                 8        

댓글을 달려면 로그인하십시오.

Answer 2

Star Strider 2024년 1월 2일

MATLAB Online에서 열기

0 개 추천

Myfile.zip

I am not certain what the problem is, however if you want to test for more than one value in a row, specify that with the 'rows' argument. It will consider all the elements in a row (columns 2 to 4 in this instance) in its determination.

Try this —

Uz = unzip('Myfile.zip')
Uz = 1×2 cell array
    {'12Dec2023Events.csv'}    {'Myfile.csv'}
t=readtable(Uz{2}, 'VariableNamingRule','preserve')
t = 520657×4 table
            UTC             MRC-TXID    MRC-CommandID    MRC-User-Defined
    ____________________    ________    _____________    ________________

16:37.994597698     {'1A'}           0                 8        
16:38.004597698     {'1A'}           0                 8        
16:38.014597698     {'1A'}           0                 8        
16:38.024597698     {'1A'}           0                 8        
16:38.034597698     {'1A'}           0                 8        
16:38.044597698     {'1A'}           0                 8        
16:38.054597599     {'1A'}           0                 8        
16:38.064597698     {'1A'}           0                 8        
16:38.074597698     {'1A'}           0                 8        
16:38.084597599     {'1A'}           0                 8        
16:38.094597599     {'1A'}           0                 8        
16:38.104597599     {'1A'}           0                 8        
16:38.114597599     {'1A'}           0                 8        
16:38.124597599     {'1A'}           0                 8        
16:38.134597599     {'1A'}           0                 8        
16:38.144597599     {'1A'}           0                 8        
[~,ind]=unique(t(:,2:4),'stable','rows');
t2=t(ind,:)
t2 = 156874×4 table
            UTC             MRC-TXID    MRC-CommandID    MRC-User-Defined
    ____________________    ________    _____________    ________________

16:37.994597698     {'1A'}            0                 8       
29:57.394167701     {'1' }          NaN               NaN       
29:57.404167701     {'1' }          NaN               NaN       
29:57.414167701     {'1' }          NaN               NaN       
29:57.424167701     {'1' }          NaN               NaN       
29:57.434167701     {'1' }          NaN               NaN       
29:57.444167701     {'1' }          NaN               NaN       
29:57.454167598     {'1' }          NaN               NaN       
29:57.464167598     {'1' }          NaN               NaN       
29:57.474167598     {'1' }          NaN               NaN       
29:57.484167598     {'1' }          NaN               NaN       
29:57.494167598     {'1' }          NaN               NaN       
29:57.504167598     {'1' }          NaN               NaN       
29:57.514167598     {'1' }          NaN               NaN       
29:57.524167598     {'1' }          NaN               NaN       
29:57.534167598     {'1' }          NaN               NaN       

If you intend something else or want a different result, please provide more details.

.

댓글 수: 2
없음 표시 없음 숨기기

Dyuman Joshi 2024년 1월 2일

The output for a table without the 'rows' option is the same with the 'rows' option specified (see the description - https://in.mathworks.com/help/releases/R2019b/matlab/ref/unique.html?s_tid=doc_ta ), so that would not make a difference.

Star Strider 2024년 1월 2일

That’s not how I read the section on Unique Rows in Matrix, although in this instance it is considering everything in the last three columns.

댓글을 달려면 로그인하십시오.

matlab 'unique' is skipping rows with data

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시 없음 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

matlab 'unique' is skipping rows with data

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 2 없음 표시 없음 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시 없음 숨기기