Efficient way to standardize large amounts of text

조회 수: 5 (최근 30일)
André Kucharzewski
André Kucharzewski 2021년 10월 19일
댓글: André Kucharzewski 2021년 10월 24일
Hello,
i have a table with a size of around 1 million rows. In one column there are different type of strings.
Mixed with letters and numbers. Like:
abc_123
cdf_123
123_cdf
123 (abc)
There are around 120 different text formats which repeat. Most of them are able to bring in a standard format like aa_11. Any format which is not able to fit get a standard undef format.
Any suggestions how i can handel such a large dataset without for loop over 1Million rows and check each cell?
Thanks in advance :)

채택된 답변

Duncan Po
Duncan Po 2021년 10월 19일
You may be able to use patterns. For example, suppose the standard format is letters followed by underscore followed by numbers, you can detect this pattern:
>> x = ["abc_123", "cdf_123", "123_cdf", "123 (abc)"]; % create an example string array
>> matches(x, lettersPattern + "_" + digitsPattern) % check if the strings match the standard pattern
ans =
1×4 logical array
1 1 0 0
  댓글 수: 1
André Kucharzewski
André Kucharzewski 2021년 10월 24일
That should do the work, but its an function introduced with R2019b I only have R2019a.
Kinda sad :(
But Thank you for ur input :)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by