How to search a string with multiple rows for text?

Hello, After running seq=getgenpept('NP_036795'); . I want to search seq.Features for some text value 'Protein' . I have been unable to find the correct function to search a string with multiple rows.
Running: k=strfind(seq.Features,'Protein') results with "Error using strfind. Input strings must have one row."
Any thoughts? Best, Joe

댓글 수: 3

Konstantinos Sofos
Konstantinos Sofos 2015년 3월 27일
편집: Konstantinos Sofos 2015년 3월 27일
Can you give us some more information so that we can help you i.e. seq is structure, cellarray...??? Can you attach an example
per isakson
per isakson 2015년 3월 27일
편집: per isakson 2015년 3월 27일
Excerpt from doc of getgenpept
Features: [40x64 char]
strfind cannot handle multi-row character arrays.
What does this array of characters look like? &nbsp BTW: it's allowed to use for-loops.
Looks like the pic below.
What kind of info are you trying to extract from 'Protein'?

댓글을 달려면 로그인하십시오.

답변 (1개)

per isakson
per isakson 2015년 3월 28일
편집: per isakson 2015년 3월 29일
I guess this block of characters is easier to read on screen than to read and parse automatically. "find the correct function" I don't think there is the function; a small program is needed. Anyhow, the script below creates a structure, sas, which is a start
%%Create test data. (The OCR-program missed most of the underscore.)
buf = { 'source 1..116 '
' /organism="Rattus norvegicus" '
' /dbxref="taxon: 10116^ '
' /chromosome=^10^ '
' /map="10824" '
'Protein 1..116 '
' /product="vesicle-associated membrane protein 2^ '
' /note="VAMP-2; synaptobrevin-2; Synaptobrevin 2 '
' (vesicle-associated membrane protein VAMP-2); '
' Vesicle-associated membrane protein (synaptobrevin 2)"'
' /calculated mol wt=12560 '
'Region 28..101 '
' /region name="Synaptobrevin" '
' /note="Synaptobrevin; pfam00957" '
' /dbxref="CDD:250253" '
'Site 95..114 '
' /site type="transmembrane region" '
' /inference="non-experimental evidence, no additional '
' details recorded" '
' /note="propagated from UniProt./Swiss-Prot (P63045.2).'
'CDS 1..116 '
' /gene="Vamp2^ '
' /gene synonym="RATVAMPB; RATVAMPIR; SYS; Syb2^ '
' /coded by="NM 012663.2:83..433" '
' /dbxref="GeneID:24803^ '
' /dbxref="RGD:3949" '};
str_array = char( buf );
%%read and parse
for rr = 1 : size( str_array, 1 )
% search rows starting with a word and followed by digits, two ".", digits
buf = regexp( str_array(rr,:), '^(\w+)\s+(\d+\.{2}\d+)', 'tokens' );
if not( isempty( buf ) )
field_name = buf{1}{1};
sas.(field_name) = buf{1}(2);
else
sas.(field_name) = cat( 1, sas.(field_name) ...
, strtrim( str_array(rr,:) ) );
end
end
The structure, sas, has one field for each sub-group
>> sas
sas =
source: {5x1 cell}
Protein: {6x1 cell}
Region: {4x1 cell}
Site: {4x1 cell}
CDS: {6x1 cell}
>> sas.Protein
ans =
'1..116'
'/product="vesicle-associated membrane protein 2^'
'/note="VAMP-2; synaptobrevin-2; Synaptobrevin 2'
'(vesicle-associated membrane protein VAMP-2);'
'Vesicle-associated membrane protein (synaptobrevin 2)"'
'/calculated mol wt=12560'
>> char( sas.Protein )
ans =
1..116
/product="vesicle-associated membrane protein 2^
/note="VAMP-2; synaptobrevin-2; Synaptobrevin 2
(vesicle-associated membrane protein VAMP-2);
Vesicle-associated membrane protein (synaptobrevin 2)"
/calculated mol wt=12560
>>
Next step is to parse the sub-blocks.

카테고리

도움말 센터File Exchange에서 Biological Physics에 대해 자세히 알아보기

태그

질문:

2015년 3월 27일

편집:

2015년 3월 29일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by