필터 지우기
필터 지우기

What format does the MSA data need to be in order to calculate pair-wise distances with seqpdist?

조회 수: 2 (최근 30일)
I am reading a clustalw text format msa with multialignread. I have tried splitting the msaread data into two cells, and keeping the structure in tact, neither method has been successful. Seqpdist does accept the sequence cell output of fastaread fasta text file of the same sequences. %This works... [heads,seqs]=fastaread('fastaformat.fasta'); distancematrix=seqpdist(seqs,'method',pam(250),'squareform',1); %This does not... [heads,seqs]=multialignread('clustalwmsa.aln1'); distancematrix=seqpdist(seqs,'method',pam(250),'squareform',1);
This is the error message:
??? Error using ==> cell.strmatch at 21
Requires character array or cell array of strings as inputs.
Error in ==> seqpdist at 258
distMethod = strmatch(lower(pval),distMethods);
Error in ==> cscalc at 14
dmat=seqpdist(seqs,'method',pam(250),'squareform',1);

채택된 답변

Walter Roberson
Walter Roberson 2011년 6월 11일
What you pass for 'method' must be a string.
The reference to pam appears to be something appropriate for a 'ScoringMatrix' parameter and the parameter you would pass for that would be the string 'pam250'
  댓글 수: 1
Adam Quintero
Adam Quintero 2011년 6월 11일
that is absolutely correct. thank you, i am getting the semantics of this function wrong. this set seqpdist to find the pairwise distance matrix of the MSA using pam250 units.
thank you

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Adam Quintero
Adam Quintero 2011년 6월 11일
To calculate a scoring matrix from a MSA based on pam250 scoring, the input needs to be a cell array of the sequence strings. Multialignread formats the sequences from a clustalw msa file (*.aln1) into a sructure with headers and sequences, or separate cell arrays with each.
The reason seqpdist could not read the sequences is because of an incorrect use of its arguments. The 'method' argument is only used if the input sequences are not already aligned. By using the sequence data from multialignread, and trying to align it again with 'method' caused the error.
The correct argument to use in this case is 'scoringmethod', where the pre-aligned sequences are re-scored using the 'scoringmethod' value.
pamdistancematrix=seqpdist(sequence,...
'scoringmethod',pam250,'squareform',1)
  댓글 수: 2
Adam Quintero
Adam Quintero 2011년 6월 11일
Wow, sorry. That was NOT the correct answer. Walter Roberson is correct in that I should use 'scoringmatrix' instead of 'method', so that the input is handled as MSA sequences and not raw FASTA.
My apologies, Robert.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Genomics and Next Generation Sequencing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by