Best solution to finding repeating characters on a line.

Question

Matthew Worker 2021년 7월 13일

3
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/877228-best-solution-to-finding-repeating-characters-on-a-line

댓글: Rena Berman 2023년 9월 26일

I am looking for any instances of two characters (e/d) being repeated in a row greater then or equal to 10. I just want to either print every line that this occurs to the command line or stop and print the location of the stop everytime it is detected. Basically I am trying to find when e and d show up over ten times grouped together in a large data file. For example:

asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs

asseefadfefeeedddeeedddasdfsdf

asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs

asseefadfefeeedddeeedddasdfsdf

The script would then print out line 2 and line 4 in the command line.

Thank you for your help

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Rena Berman 2023년 9월 26일

(Answers Dev) Restored edit

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2021년 7월 13일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/877228-best-solution-to-finding-repeating-characters-on-a-line#answer_745433

편집: Stephen23 2021년 7월 13일

MATLAB Online에서 열기

inp = {'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs';'asseefadfefaaadddaaadddasdfsdf';'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs';'asseefadfefaaadddaaadddasdfsdf'};
rgx = '(.)(??$1*)(.?)(??[$1$2]*)';
spl = regexp(inp,rgx,'match');
idx = cellfun(@(c)any(cellfun(@numel,c)>9),spl);
find(idx)
ans = 2×1
     2
     4

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Walter Roberson 2021년 7월 13일

MATLAB Online에서 열기

The bold text does not represent repetitions this time, not unless you mean repetition between lines. In the previous example there was two halves, with the second being the same as the first.

If the task is to find places where there is a string of at least 10 d or e characters then

'[de]{10,}'

can find that, and the 'once' and isempty and indexing from my Answer gives you the rest. It just depends on your having used readlines() on the file.

Stephen23 2021년 7월 13일

편집: Rena Berman 2023년 9월 22일

Matthew Worker: are the specific characters known in advance? Or do you want to detect them automatically? (i.e. detect any two characters that are repeated more than 10 times contiguously)

Are there any particular patterns that you need to include/exclude? (e.g. does 10 'e' characters in a row count, or does the sequence have to include at least one 'd' character?).

댓글을 달려면 로그인하십시오.

Answer 2

Walter Roberson 2021년 7월 13일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/877228-best-solution-to-finding-repeating-characters-on-a-line#answer_745428

MATLAB Online에서 열기

You say "10 or over", so is it correct that the program needs to all possible patterns? For example,

'adadadadaaaadadadadaaa'
ans = 'adadadadaaaadadadadaaa'

(length 22) should be located if it exists?

S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf'}
S = 1×2 cell array
    {'asseefadfefaaadddaaadddasdfsdf'}    {'asseeadadadadaaaadadadadaaadfsdf'}
matches = regexp(S, '([ad]{5,})\1', 'match');
celldisp(matches)
 
matches{1}{1} =
 
aaadddaaaddd
 
 
matches{2}{1} =
 
adadadadaaaadadadadaaa
 

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Matthew Worker 2021년 7월 13일

@Walter Roberson Two specific characters (d/e), sorry should've gone more in depth. The problem is already solved as I can convert the data to an array. But if you know a way of going through a block of text data (in a txt file) and stating the lines it is detected, I would be very appreciative.

Thank you for you previous answer as it does give a good explanation.

Walter Roberson 2021년 7월 14일

MATLAB Online에서 열기

Example of reading from file:

%create a file for demonstration purposes only
tname = [tempname() '.txt'];
fid = fopen(tname, 'w');
T = regexprep('asseefadfefaaadddaaadddasdfsdf\nasseeadadadadaaaadadadadaaadfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\n', 'a', 'e');
fprintf(fid, T);
fclose(fid);
%okay, main function
filename = tname;
%okay, main function
S = readlines(filename);
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
matches = 4×1 string array
    "esseefedfefeeedddeeedddesdfsdf"
    "esseeededededeeeededededeeedfsdf"
    "esseefedfefeeedddeeedddesdfsdf"
    "esseefedfefeeedddeeedddesdfsdf"
%alternative without readlines
S = regexp(fileread(filename), '\r?\n', 'split');
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
matches = 1×4 cell array
    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseeededededeeeededededeeedfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}
%alternative without splitting
S = fileread(filename);
matches = regexp(S, '^.*[de]{10}.*$', 'match', 'dotexceptnewline', 'lineanchors');
matches
matches = 1×4 cell array
    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseeededededeeeededededeeedfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}

댓글을 달려면 로그인하십시오.

Best solution to finding repeating characters on a line.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Best solution to finding repeating characters on a line.

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기