how can i change an indice in Matrix as vector?
조회 수: 1 (최근 30일)
이전 댓글 표시
I have sequences as character arrays. I need to search particular characters and change them with vectors(Boolean representations).
So finally i need 3 D matrix.
It worked for one sequences but i have 96000 more. I tried to do with loops but i get error.
Theese are my code for one sequences bu i need to do for 96000 sequences.
I need your help about that issue, Thanks in advance
p1_1=sequences;
% first sequence selected and converted to character array
Chp1_1=char(p1_1(1,:));
% from first character to end of sequences search for every character to replace boolean representation
SeqL = length(Chp1_1);
for i=1:SeqL
X = Chp1_1(1,i)
switch X
case 'A'
M(i,:) = A1;
case 'C'
M(i,:) = C1;
case 'D'
M(i,:) = D1;
case 'E'
M(i,:) = E1;
case 'F'
M(i,:) = F1;
case 'G'
M(i,:) = G1;
case 'H'
M(i,:) = H1;
case 'I'
M(i,:) = I1;
case 'K'
M(i,:) = K1;
case 'L'
M(i,:) = L1;
case 'M'
M(i,:) = M1;
case 'N'
M(i,:) = N1;
case 'P'
M(i,:) = P1;
case 'Q'
M(i,:) = Q1;
case 'R'
M(i,:) = R1;
case 'S'
M(i,:) = S1;
case 'T'
M(i,:) = T1;
case 'V'
M(i,:) = V1;
case 'W'
M(i,:) = W1;
case 'Y'
M(i,:) = Y1;
end
end
댓글 수: 4
Guillaume
2019년 11월 26일
편집: Guillaume
2019년 11월 26일
It's important to use notation that actually reflects your data. Otherwise, the code we give you might not work. It's also important to use the proper notation. Because now, we're left wondering:
- Do you have numbered variables as per your Protein_1, Protein_2, etc.
- Do you have a cell array of char vector as per your "{1,96000}" which is a cell array notation
- Do you have a string array as per your "in the [...] string array"
답변 (3개)
Guillaume
2019년 11월 25일
First, probably the most important thing: numbered or sequentially named variables are always a very bad idea. they always make the code more complicated, not easier, to write. For example, with your protein_1, protein_2, ... protein_96000 you cannot easily apply the same code to each variable, whereas if you just had one variable, for example a cell array called protein, you could just use a loop to apply the same code to each:
for p = 1:numel(protein)
dosomethingwith(protein{p});
end
Same with your horrible switch...case and your A1, C1, etc. You end up rewriting many times the same thing with only one variation, with increased risk that you make a mistake on one line. Computers are very good at doing repetitive things, so why do you end up doing the repetition yourself.
Anything that is numbered or sequentially named should be just one variable that you index instead.
So, with regards to your transformation, first create two variables, the first one the list of letters to transform and the second one what they need to be transformed into, eg:
letters = 'ACDEFGHIKLMNPQSTVWY'.'; %column vector of letters
acid = [1 0 0 0 0;
0 1 0 0 0;
0 0 1 0 0;
0 0 0 1 0;
..etc.
];
For pretty display we could even put them into a table:
map = table(letters, acid);
Now that we have that transforming a sequence of letters into a 2D matrix is trivial:
prot = 'ACDKLMEGAC'; %content and length doesn't matter
[found, whichrow] = ismember(prot, map.letters); %find which row of letters correspond to each letter of prot
assert(all(found), 'some letters of the input are invalid');
transformed = map.acid(whichrow, :); %and use the correspond row of acid instead
%all done!
And assuming protein is the above mentioned cell array where all the sequences are the same length, then:
transformed = zeros(numel(protein{1}, size(map.acid, 2), numel(protein))); %preallocated 3D array
for p = 1:numel(protein)
[found, whichrow] = ismember(protein{p}, map.letters); %find which row of letters correspond to each letter of prot
assert(all(found), 'some letters of protein %d are invalid', p);
transformed(:, :, p) = map.acid(whichrow, :); %and use the correspond row of acid instead
end
See how short the code can be once you don't have numbered variables and use indexing instead?
댓글 수: 0
Philippe Lebel
2019년 11월 25일
I am not sure what you are trying to do as a whole, but if you want to quickly find where there are occurences of a certain string, use strfind().
a = 'aasdasffwfdasda';
your_sequence_of_bools_for_letter_a = [true false true];
idx = strfind(a,'a')
ans =
1 2 5 12 15
M=cell(1,length(a));
for i=1:length(idx)
M{idx(i)} = your_sequence_of_bools_for_letter_a;
end
Philippe Lebel
2019년 11월 25일
Now i understand.
Here is a solution that you can easily expand.
clear
protein(1).name = 'A';
protain(1).bool_value = [1 0 0];
protein(2).name = 'B';
protain(2).bool_value = [0 1 0];
protein(3).name = 'C';
protain(3).bool_value = [0 0 1];
protein_name_list = [protein.name];
sequences = ['ABC';'CCC';'CAB'];
M=cell(1,length(sequences));
for i=1:length(sequences)
resulting_bool = [];
sequence = sequences(i,:);
for j = 1:length(sequence)
idx = strfind(protein_name_list, sequence(j));
resulting_bool = [resulting_bool ;protain(idx).bool_value];
end
M{i} = resulting_bool;
end
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Genomics and Next Generation Sequencing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!