필터 지우기
필터 지우기

Matching combinations of strings

조회 수: 10 (최근 30일)
Marcus Glover
Marcus Glover 2024년 6월 17일
편집: DGM 2024년 6월 22일
I have a table TT with a string variable TT.name. I want to return true if TT.name matches any entry in another table variable OK.name. However, I have some complications I am having a hard time parsing.
Many of the strings in TT.name are combinations of strings that appear in OK.name. I want to include these as a true match. Sometimes they have a + symbol, sometimes just a space. Further complicating matters, the table OK contains some entries with spaces, and if they do I want to treat them as an entire entry, and not break them up at the spaces.
I believe I will usually have a combination of 2 strings only, though 3 and 4 may be possible.
TT = table(["Green"; "Red"; "Blue"; "Black Blue"; "Black"; "Blue Green"; "Red + Blue"; "Red Orange"; "Red + White"; "Black Blue Red"], 'VariableNames', {'name'})
TT = 10x1 table
name ________________ "Green" "Red" "Blue" "Black Blue" "Black" "Blue Green" "Red + Blue" "Red Orange" "Red + White" "Black Blue Red"
OK = table(["Red"; "Green"; "Blue"; "Black Blue"], 'VariableNames', {'name'})
OK = 4x1 table
name ____________ "Red" "Green" "Blue" "Black Blue"
This is the output I would want, but not by manually changing rows 6 and 7:
TT.match=ismember(TT.name,OK.name);
TT.match([6 7 10])=1
TT = 10x2 table
name match ________________ _____ "Green" true "Red" true "Blue" true "Black Blue" true "Black" false "Blue Green" true "Red + Blue" true "Red Orange" false "Red + White" false "Black Blue Red" true
In the example, "Blue Green" and "Red + Blue" are true matchs, because "Blue" "Green" and "Red" all appear as entries in OK.name.
SImilarly, "Black Blue Red" is ok because it is a combination of "Black Blue" and "Red"
"Black" is not a match, because the only entry in OK.name is "Black Blue" and I do not want to separate the words from this table.
"Red Orange" and "Red + Orange" are not matches because only "Red" is in the OK table.
  댓글 수: 2
Stephen23
Stephen23 2024년 6월 18일
편집: Stephen23 2024년 6월 18일
The task is ill-defined, and most likely impossible in a general sense: this is due to the same delimiters being used to separate words in OK as well as to separate combinations from TT. Consider:
TT = "black blue" + "red" -> "black blue red"
OK = ["black", "blue red"]
Also note that a naive approach considering all permutations of OK will quickly become intractable.
Questions:
  • what size is OK ?
  • what size is TT ?
Marcus Glover
Marcus Glover 2024년 6월 18일
편집: Marcus Glover 2024년 6월 18일
I think the size of OK (~250) is indeed going to make this intractable. (TT is hundreds of thousands of entries) The solution is to fix the issue with delimiters in the data.

댓글을 달려면 로그인하십시오.

답변 (1개)

Umar
Umar 2024년 6월 18일
Hi Marcus,To achieve this, you can use a combination of string manipulation functions and logical comparisons in MATLAB. Here's a step-by-step approach to solving this problem: 1. Iterate through each row in the `TT.name` table. 2. For each row, split the string into individual words based on spaces or the "+" symbol. 3. Check if each individual word exists as an entry in the `OK.name` table. 4. If all words in the split string are found in the `OK.name` table, consider it a match. 5. Update the `TT.match` column accordingly. Here's some MATLAB code that implements this logic: ```matlab TT.match = false(size(TT, 1), 1); for i = 1:size(TT, 1) words = strsplit(TT.name{i}, {' ', '+'}); match_count = sum(ismember(words, OK.name)); if match_count == numel(words) TT.match(i) = true; end end ``` By following these steps, you can efficiently handle combinations of strings and spaces within the `TT.name` table and accurately identify matches based on the entries in the `OK.name` table. This approach ensures that you can automatically identify true matches without manually changing rows, as demonstrated in your desired output example. Additionally, it considers multiple strings combinations while respecting the specific conditions outlined for matching entries.
  댓글 수: 9
Umar
Umar 2024년 6월 22일
Apology accepted
DGM
DGM 2024년 6월 22일
편집: DGM 2024년 6월 22일
It's okay. You're still free to think of me as a jerk. I mean, it's fair. Just please try to work on the formatting and stuff.
FWIW, also if you don't have MATLAB, I'm pretty sure you can use MATLAB Online for free for something like 20h a month. It doesn't have as many toolboxes installed as the forum editor, but it does allow the use of certain things (interactive tools) that the forum editor can't use.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Graphics Performance에 대해 자세히 알아보기

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by