Getting strings to combine multiple times

조회 수: 8 (최근 30일)
Matthew Zehner
Matthew Zehner 2016년 4월 28일
댓글: Walter Roberson 2016년 4월 28일
Ok I have a school project that I have to group a DNA sequence of 550437 codons together. At the moment I have it set up as a string. Basically 1 letter per cell on 550437 cells. I have to show how many times AAA, ATC, and CGG show up in that sequence without overlap. I also have to show the location of the first 10. I've tried reshaping from a 550437x1 to a 183479x3 but the order doesn't align every third from left to right. Column 1 will have the first 183479, the second column will have the second and the third column will have the final set. I would either like to group every 3 cells into one cell, or just give me a numeric notation telling me when my selected sequence shows up. Here's what I have so far to show me how many times each sequence shows up. Now I can't figure out how to find where the first 10 instances of each show up.
x=1;
i=1;%%%Variable for AAA
h=1;%%%Variable for ATC
t=1;%%%Variable for CGG
AAAmatch=0;%%%Sets up for exact match
ATCmatch=0;%%%Sets up for exact match
CGGmatch=0;%%%Sets up for exact match
AAAcount=0;%%%Counter for AAA match
ATCcount=0;%%%Counter for ATC match
CGGcount=0;%%%Counter for CGG match
%%%Locates AAA match in entire sequence without overlap
for i=1:length(DNA)-2
if strcmp(DNA(i),'A')
AAAmatch=AAAmatch+1;
end
if strcmp(DNA(i+1),'A')
AAAmatch=AAAmatch+1;
end
if strcmp(DNA(i+2),'A')
AAAmatch=AAAmatch+1;
end
if AAAmatch==3
AAAcount=1+AAAcount;
end
AAAmatch=0;
end
%%%Locates ATC match in entire sequence without overlap
for h=1:length(DNA)-2
if strcmp(DNA(h),'A')
ATCmatch=ATCmatch+1;
end
if strcmp(DNA(h+1),'T')
ATCmatch=ATCmatch+1;
end
if strcmp(DNA(h+2),'C')
ATCmatch=ATCmatch+1;
end
if ATCmatch==3
ATCcount=1+ATCcount;
end
ATCmatch=0;
end
%%%Locates CGG match in entire sequence without overlap
for t=1:length(DNA)-2
if strcmp(DNA(t),'C')
CGGmatch=CGGmatch+1;
end
if strcmp(DNA(t+1),'G')
CGGmatch=CGGmatch+1;
end
if strcmp(DNA(t+2),'G')
CGGmatch=CGGmatch+1;
end
if CGGmatch==3
CGGcount=1+CGGcount;
end
CGGmatch=0;
end
Thoughts?
  댓글 수: 1
Azzi Abdelmalek
Azzi Abdelmalek 2016년 4월 28일
You can make your question clear and brief, by posting an example with the expected result. You can also add some explanations.

댓글을 달려면 로그인하십시오.

답변 (1개)

Walter Roberson
Walter Roberson 2016년 4월 28일
Consider using strfind() . But you do need to put in some logic to detect a potential overlap between the final character of one and the first of the next. Also if you had something like 'AAAA' then strfind() of 'AAA' will return both 1 and 2 (that is, strfind does not care about overlaps.) Still, strfind() will help give you candidate positions that you can winnow out.
What would you want the result to be if there was 'AAATCGG' in the sequence? Is that one AAA and one CGG, or is it one ATC ?
  댓글 수: 2
Matthew Zehner
Matthew Zehner 2016년 4월 28일
편집: Matthew Zehner 2016년 4월 28일
I've tried strfind. Since I'm working with cells with a single letter in them it doesn't work. I need to figure out AAA, ATC, and CGG individually. strfind only returns a [1] if it's true or []. And I only get the true or false if I use a single letter and not the 3 letters together. I don't get a numerical output as you would if you had a normal string like DNA='ATCAAACGGATCAACGTACAGTCATAC'. That would work rather easily. But since I have an array with over half a million cells strfind just tells me if there is the letter I'm looking for or not. Doesn't tell me there number.
Walter Roberson
Walter Roberson 2016년 4월 28일
horzcat(DNA{:}) and the result will be a string.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Workspace Variables and MAT Files에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by