I want to convert a character series into numerical series using for loop

조회 수: 2 (최근 30일)
S Kar
S Kar 2022년 6월 8일
댓글: S Kar 2022년 6월 8일
I have a character sequence stored in variable DNA_SEQS = 'AGGTAT.....'. The sequence consists of four type of character 'A', 'C', 'T' & 'G', therefore I have used swith case to generate the numerical sequence. The code I have written is:
seqs = fastaread('AF0071891.fasta');
DNA_SEQS = seqs.Sequence;
len = length(DNA_SEQS);
for j = 1:5
x = [];
a = DNA_SEQS(j);
switch a
case 'A'
v = 0;
case 'C'
v = 1;
case 'G'
v = 2;
case 'T'
v = 3;
end
x(j+1) = [x(j) v];
end
By using this code I supposed to get a numerical array like [0,2,2,3,0] but I got an error as: Index exceeds matrix dimensions.
Please help

채택된 답변

dpb
dpb 2022년 6월 8일
편집: dpb 2022년 6월 8일
for j = 1:5
x = [];
a = DNA_SEQS(j);
...
You wipe out what you put in x later every time you start through the loop again...don't do that!!! :)
x = [];
for j = 1:5
a = DNA_SEQS(j);
...
instead, although you should
  1. preallocate and asign into the array instead
  2. size x() based on the length of the string, not hardcode the loop count
N=strlength(DNA_SEQS);
x=zeros(1,N);
for j = 1:N
a = DNA_SEQS(j);
...
However, in MATLAB you don't need a loop; use a lookup table instead. One way (not necessarily the fastest, but pretty easy to code) would be
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS));
This would return for your sample above...
>> DNA_SEQS = 'AGGTAT';
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS))
DNA_VALS =
0 2 2 3 0 3
>>
  댓글 수: 1
S Kar
S Kar 2022년 6월 8일
Thank you but still got this error using the first method:
In an assignment A(:) = B, the number of elements in A and B must be the same.
The second method is working fine

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

DGM
DGM 2022년 6월 8일
You can use ismember():
thisstr = 'AGGATATC';
charmap = 'ACGT';
[~,idx] = ismember(thisstr,charmap);
idx = idx-1
idx = 1×8
0 2 2 0 3 0 3 1
  댓글 수: 4
dpb
dpb 2022년 6월 8일
For exactly the reason I outlined above as a possibility -- it isn't a char() array --
>> DNA_SEQS='AGGTAT'; % assign as char() string (and array of char())
>> N=strlength(DNA_SEQS) % strlength() is same as length(x,2) here...
ans =
6
>> for i=1:N,disp(DNA_SEQS(i));end % works find for a char() array with () addressing
A
G
G
T
A
T
>> DNA_SEQS = cellstr('AGGTAT'); % redefine as a cellstr() instead...
>> N=strlength(DNA_SEQS) % strlength knows about what is in the cell
N =
6
>> for i=1:N,disp(DNA_SEQS(i));end % but it fails as you see...
{'AGGTAT'}
Index exceeds the number of array elements (1).
>>
WHY!!!???
>> size(DNA_SEQS) % because now the cellstr is a 1x1 CELL array, NOT 1x6 char() array...
ans =
1 1
>>
How to make work???
"Use the curlies, Luke!!!"
>> for i=1:N,disp(DNA_SEQS{1}(i));end
A
G
G
T
A
T
>>
NB: above the use of {1} to "dereference" the cell array back to the content of the char() array inside it -- the subsequent "smooth" parenstheses (i) then picks the ith element from that vector again, just as it did directly when it was "only" a char() array, not a char() array in a cell.
Strings behave similarly as cellstr(); you have to use {} (the "curlies") to reference inside the string to the individual characters that make up the string array element.
See cellstr and links there for addressing cell strings and cells in general.
S Kar
S Kar 2022년 6월 8일
Thank you so much for the elaboration.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Type Conversion에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by