# How do I count the total number 'AA' repeats in the text? I managed to count how many times it appears in each cell, but don't know how to add it up.

조회 수: 5(최근 30일)
Nitzan Kahn 2018년 5월 3일
댓글: Nitzan Kahn 2018년 5월 5일
clear all;
Seq_Out_File='./Seq_Out.txt';
fileID=fopen(Seq_Out_File);
C=textscan(fileID, '%s');
NAA=(strfind(C{1},'AA'));
x=cellfun('length', NAA);
##### 댓글 수: 2표시숨기기 이전 댓글 수: 1
Nitzan Kahn 2018년 5월 5일
AAA is two matches

댓글을 달려면 로그인하십시오.

### 채택된 답변

Walter Roberson 2018년 5월 3일
no_overlap_count = length(regexp(S, 'AA'));
with_overlap_count = length(regexp(S, 'A(?=A)'));
##### 댓글 수: 3표시숨기기 이전 댓글 수: 2
Nitzan Kahn 2018년 5월 5일
This is exactly what i needed, thanks for your help!

댓글을 달려면 로그인하십시오.

### 추가 답변(1개)

John BG 2018년 5월 4일
편집: John BG 2018년 5월 4일
Hi Nitzan Khan
1.-
According to
AAA counts as 2x AA and AAAA would count as 3x AA.
.
2.-
Also, you asked for AA match but you may want to really count all possible outcomes, all possible pairs of the basic sequence 'ACTG'
A basic equivalent to the Stack Overflow code in MATLAB would be:
A1='CTACTGCGACTTATGCCCATAATTGGCCACAATAAGTTTCTCGGATTCGCAGGTACCCTCGAGAGTATGGTCGTGGACTCAACCTTAGAGGCAACGGAGT'
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A1,L2(k,:))
nRep=[nRep numel(t1)]
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L3=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L3=L3(:)'
L3(end)=[]
str3=['T1=table(' L3 ')']
evalin('base',str3)
T1 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
5 7 6 7 6 4 7 6 7 6 5 5 7 5 6 7
.
3.-
For a complete sequence in a text file:
clear all;clc;close all
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A,L2(k,:));
nRep=[nRep numel(t1)];
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L32=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L32=L32(:)'
L32(end)=[]
.
the count table for all pairs is:
.
str3=['T2=table(' L32 ')']
T2 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
48684 62452 62140 62799 62493 48543 62702 62323 62328 62690 48502 62482 62491 62410 62683 48427
.
Besides the sequence used, also attached the saved variables in .mat file.
You can add more sequences, one each row, as show in the link above
.
thanks in advance for time and attention
John BG

댓글을 달려면 로그인하십시오.

### 범주

Find more on Geographic Plots in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by