MATLAB Answers

How do I count the total number 'AA' repeats in the text? I managed to count how many times it appears in each cell, but don't know how to add it up.

조회 수: 7(최근 30일)
clear all;
%read SeqDNA file
Seq_Out_File='./Seq_Out.txt';
fileID=fopen(Seq_Out_File);
C=textscan(fileID, '%s');
NAA=(strfind(C{1},'AA'));
x=cellfun('length', NAA);

  댓글 수: 2

Walter Roberson
Walter Roberson 3 May 2018
What do you want to do if there is 'AAA' ? Is that one match or two? Is 'AAAA' 2 matches (non-overlapping) or 3 (overlapping)

로그인 to comment.

채택된 답변

Walter Roberson
Walter Roberson 3 May 2018
S = fileread(Seq_Out_File);
no_overlap_count = length(regexp(S, 'AA'));
with_overlap_count = length(regexp(S, 'A(?=A)'));

  댓글 수: 3

John BG
John BG 4 May 2018
Hi Walter
I have tried once the attached script, random generator.
For the 1st 100 characters I checked
A1='CTACTGCGACTTATGCCCATAATTGGCCACAATAAGTTTCTCGGATTCGCAGGTACCCTCGAGAGTATGGTCGTGGACTCAACCTTAGAGGCAACGGAGT'
there are only 5 pairs AA, and no longer bursts AAA AAAA ..
Shouldn't then your with_overlap_count return null?
pr_AA=length(regexp(A1,'AA'))
sq_AA=length(regexp(A1,'A(?=A)'))
pr_AA =
5
sq_AA =
5
Walter Roberson
Walter Roberson 4 May 2018
No, the with_overlap_count is the count when overlapping is permitted but not mandatory.

로그인 to comment.

추가 답변(1개)

John BG
John BG 4 May 2018
편집: John BG 4 May 2018
Hi Nitzan Khan
1.-
According to
AAA counts as 2x AA and AAAA would count as 3x AA.
.
2.-
Also, you asked for AA match but you may want to really count all possible outcomes, all possible pairs of the basic sequence 'ACTG'
A basic equivalent to the Stack Overflow code in MATLAB would be:
A1='CTACTGCGACTTATGCCCATAATTGGCCACAATAAGTTTCTCGGATTCGCAGGTACCCTCGAGAGTATGGTCGTGGACTCAACCTTAGAGGCAACGGAGT'
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A1,L2(k,:))
nRep=[nRep numel(t1)]
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L3=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L3=L3(:)'
L3(end)=[]
str3=['T1=table(' L3 ')']
evalin('base',str3)
T1 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
5 7 6 7 6 4 7 6 7 6 5 5 7 5 6 7
.
3.-
For a complete sequence in a text file:
clear all;clc;close all
A=fileread('Seq_Out.txt');
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A,L2(k,:));
nRep=[nRep numel(t1)];
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L32=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L32=L32(:)'
L32(end)=[]
.
the count table for all pairs is:
.
str3=['T2=table(' L32 ')']
T2 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
48684 62452 62140 62799 62493 48543 62702 62323 62328 62690 48502 62482 62491 62410 62683 48427
.
Besides the sequence used, also attached the saved variables in .mat file.
You can add more sequences, one each row, as show in the link above
.
If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?
To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link
thanks in advance for time and attention
John BG

  댓글 수: 0

로그인 to comment.

이 질문에 답변하려면 로그인을(를) 수행하십시오.


Translated by