Most efficient way to search in text arrays

조회 수: 24 (최근 30일)
Uwe Ehret
Uwe Ehret 2017년 4월 18일
답변: Uwe Ehret 2017년 4월 22일
Dear All,
I have a large array of textual information (model element IDs). In this array, I frequently have to find the index of particular IDs. What is the computationally most efficient way to do this? Store the IDs in a character array, cell array of characters or string array? What is the most efficient way for indexing when working with text?
Thanks for your help! Uwe

채택된 답변

Alain Kuchta
Alain Kuchta 2017년 4월 21일
편집: Alain Kuchta 2017년 4월 21일
Here are two possible approaches to accomplish this:
1) Use strcmp and find with string array
This option is O(n) for each lookup; in the worst case, every string in ids will be checked in order to find query. You can also use a cell array of character vectors for ids, in my test a string array was slightly faster.
>> ids = ["M1","M2", "M3"];
>> query = "M2";
>> index = find(strcmp(ids, query) == 1)
index =
2
2) Use containers.Map with char arrays as keys and indices as values
This option is O(n) to setup, but O(1) for each lookup. Meaning that regardless of how many ids are in your map, looking up each one will take the same amount of time.
>> ids = {'M1', 'M2', 'M3'};
>> indices = 1:length(ids);
>> idMap = containers.Map(ids, indices);
>> query = 'M2';
>> index = idMap(query)
index =
2
Here is a performance comparison. At each size increment the average time to compute 500 random queries was measured for each approach. Each approach used the same set of queries at each size increment. In my case, for less than ~1000 elements, find with strcmp is faster. But as the number of elements grows, containers.Map is the clear winner.
  댓글 수: 1
Walter Roberson
Walter Roberson 2017년 4월 21일
The O(1) reference sounds as if containers.Map is using hashing -- which is a possibility but not one I see documented ?
True O(1) would tend to imply that it is using Perfect Hash, as regular hashes that can have collisions would have an O(n) or O(ln(n)) or O(n*ln(n)) term for the worst case as the table fills up.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Uwe Ehret
Uwe Ehret 2017년 4월 22일
Dear Alain,
Thanks for your very helpful reply! The containers.Map option was new to me, but offers many very elegant and time-saving improvements to the program I am working on.
Uwe

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by