Speed Up String Conversion

조회 수: 9 (최근 30일)
Stephen Gray
Stephen Gray 2024년 5월 1일
댓글: Stephen Gray 2024년 5월 1일
Hi all. I am trying to speed up string conversion of a table field as below :-
GoingUC=string(table2cell(Inps(:,5)));
Inps is a table with approximately 730000 records with 13 fields. I've got 6 categorical fields to convert and it is taking over 2.5 hours so I wondered if there was a quicker way to do this. I need a string array for the following code which converts the categorical strings to numbers in a map (which is quick) :-
[Unique_GoingU,~,GoingU_Numeric_Cats] = unique(GoingUC);
CTNM_GoingU=containers.Map(Unique_GoingU,num2cell(1:length(Unique_GoingU)));
NTD_GoingU=cell2mat(values(CTNM_GoingU,num2cell(GoingUC)));
It all works perfectly for my use but it's just if I can speed it up that would be great.
Steve Gray
  댓글 수: 2
Voss
Voss 2024년 5월 1일
The third output from unique is the same as the end result (or the transpose of the end result, if GoingUC is a row vector), so using a Map is unnecessary.
GoingUC = string(randi(10,10000,1))
GoingUC = 10000x1 string array
"9" "6" "2" "3" "9" "1" "10" "5" "4" "9" "4" "10" "10" "3" "10" "8" "7" "2" "9" "7" "2" "2" "3" "7" "8" "9" "7" "1" "1" "6"
[Unique_GoingU,~,GoingU_Numeric_Cats] = unique(GoingUC)
Unique_GoingU = 10x1 string array
"1" "10" "2" "3" "4" "5" "6" "7" "8" "9"
GoingU_Numeric_Cats = 10000x1
10 7 3 4 10 1 2 6 5 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
CTNM_GoingU=containers.Map(Unique_GoingU,num2cell(1:length(Unique_GoingU)));
NTD_GoingU=cell2mat(values(CTNM_GoingU,num2cell(GoingUC)))
NTD_GoingU = 10000x1
10 7 3 4 10 1 2 6 5 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
isequal(GoingU_Numeric_Cats,NTD_GoingU)
ans = logical
1
Stephen Gray
Stephen Gray 2024년 5월 1일
Thanks!

댓글을 달려면 로그인하십시오.

채택된 답변

Voss
Voss 2024년 5월 1일
Avoid using table2cell for this; instead, access the table data directly (using curly braces {}, or, even better, dot indexing)
% 100000x1 table of categoricals
Inps = table(categorical(randi(10,100000,1)))
Inps = 100000x1 table
Var1 ____ 7 10 8 5 7 3 10 8 4 2 6 6 2 10 10 9
% using table2cell
tic
str1 = string(table2cell(Inps(:,1)));
toc
Elapsed time is 1.676799 seconds.
% using curly brace indexing
tic
str2 = string(Inps{:,1});
toc
Elapsed time is 0.013733 seconds.
% using dot indexing
tic
str3 = string(Inps.(1));
toc
Elapsed time is 0.005515 seconds.
Accessing the table data directly is > 100 times faster, and produces the same result:
isequal(str2,str2,str3)
ans = logical
1
  댓글 수: 3
Voss
Voss 2024년 5월 1일
편집: Voss 2024년 5월 1일
You're welcome!
table2cell could be useful for collecting multiple variables of a table into a cell array, particularly if the variables contain different classes of data. Although I would most likely just keep the data in table form.
T = table(rand(10,1),cellstr(char(65+randi([0,9],10,5))),string(rand(10,1)))
T = 10x3 table
Var1 Var2 Var3 _______ _________ __________ 0.23051 {'ADAJB'} "0.15424" 0.46691 {'FACCA'} "0.49046" 0.60176 {'BFJGB'} "0.12775" 0.97235 {'IBGBJ'} "0.93042" 0.26794 {'GCCAI'} "0.42212" 0.13361 {'GABEB'} "0.094709" 0.12238 {'EEFBH'} "0.14285" 0.24268 {'CDDDG'} "0.42503" 0.69713 {'IGHGF'} "0.075316" 0.59503 {'JFEBG'} "0.36855"
% table to cell keeps the data classes as they are in the table
C = table2cell(T(:,[1 2 3]))
C = 10x3 cell array
{[0.2305]} {'ADAJB'} {["0.15424" ]} {[0.4669]} {'FACCA'} {["0.49046" ]} {[0.6018]} {'BFJGB'} {["0.12775" ]} {[0.9724]} {'IBGBJ'} {["0.93042" ]} {[0.2679]} {'GCCAI'} {["0.42212" ]} {[0.1336]} {'GABEB'} {["0.094709"]} {[0.1224]} {'EEFBH'} {["0.14285" ]} {[0.2427]} {'CDDDG'} {["0.42503" ]} {[0.6971]} {'IGHGF'} {["0.075316"]} {[0.5950]} {'JFEBG'} {["0.36855" ]}
% but the concatenation required when accessing directly converts
% numeric and cell char to string, in order to combine the
% numeric and cell char table variables with the string variable
T{:,[1 2 3]}
ans = 10x3 string array
"0.23051" "ADAJB" "0.15424" "0.46691" "FACCA" "0.49046" "0.60176" "BFJGB" "0.12775" "0.97235" "IBGBJ" "0.93042" "0.26794" "GCCAI" "0.42212" "0.13361" "GABEB" "0.094709" "0.12238" "EEFBH" "0.14285" "0.24268" "CDDDG" "0.42503" "0.69713" "IGHGF" "0.075316" "0.59503" "JFEBG" "0.36855"
C = [T.(1) T.(2) T.(3)]
C = 10x3 string array
"0.23051" "ADAJB" "0.15424" "0.46691" "FACCA" "0.49046" "0.60176" "BFJGB" "0.12775" "0.97235" "IBGBJ" "0.93042" "0.26794" "GCCAI" "0.42212" "0.13361" "GABEB" "0.094709" "0.12238" "EEFBH" "0.14285" "0.24268" "CDDDG" "0.42503" "0.69713" "IGHGF" "0.075316" "0.59503" "JFEBG" "0.36855"
Stephen Gray
Stephen Gray 2024년 5월 1일
Cool, understood.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Cell Arrays에 대해 자세히 알아보기

제품


릴리스

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by