Checking whether string exist in a cell and read its ID

조회 수: 2 (최근 30일)
M
M 2021년 5월 14일
댓글: DGM 2021년 5월 15일
I have a cell array called "proteins_selected" and each row is a code for a protein, looks like this:
Then I would like to translate each protein code into its ID, so i have another cell array "GoAnnotationsWithId" which is like a dictionary(the second row is ID):
So i would like to chech whether proteins from proteins_selected exist in GoAnnotationsWithId file, and get appropriate ID of this protein. I would be extremely grateful for any help.

채택된 답변

Stephen23
Stephen23 2021년 5월 15일
The simple, efficient MATLAB approach would be to use ISMEMBER:
A = {'banana',12;'orange',23;'apple',35;'peach',74;'pear',51;'cherry',62}
A = 6×2 cell array
{'banana'} {[12]} {'orange'} {[23]} {'apple' } {[35]} {'peach' } {[74]} {'pear' } {[51]} {'cherry'} {[62]}
B = {'apple','bldf','orange'}
B = 1×3 cell array
{'apple'} {'bldf'} {'orange'}
[X,Y] = ismember(B,A(:,1));
V = nan(size(X));
V(X) = [A{Y(X),2}]
V = 1×3
35 NaN 23

추가 답변 (1개)

DGM
DGM 2021년 5월 14일
Something like this may work
LUT = {'banana',12;
'orange',23;
'apple',35;
'peach',74;
'pear',51;
'cherry',62};
% random assortment of fruit names
mydata = LUT(randi(6,10,1),1)
% if you want the output to be a numeric vector
f = @(x) LUT{strcmp(x,LUT(:,1)),2};
output = cellfun(f,mydata)
gives
mydata =
10×1 cell array
{'cherry'}
{'orange'}
{'apple' }
{'peach' }
{'peach' }
{'peach' }
{'banana'}
{'pear' }
{'cherry'}
{'banana'}
output =
62
23
35
74
74
74
12
51
62
12
Alternatively:
% if you want the output to be a cell vector
f = @(x) LUT(strcmp(x,LUT(:,1)),2);
output = cellfun(f,mydata)
  댓글 수: 2
M
M 2021년 5월 14일
As I understood, in my case it should look like:
f = @(x) GoAnnotationsWithId{strcmp(x,GoAnnotationsWithId(:,1)), 2};
output = cellfun(f,proteins_selected)
Unfortunately it doesn't work for me, i got an error:
Insufficient number of outputs from right hand side of equal sign to satisfy assignment.
Error in umb_2_0 (line 61)
f = @(x) GoAnnotationsWithId{strcmp(x,GoAnnotationsWithId(:,1)), 2};
Am I doing something wrong?
DGM
DGM 2021년 5월 14일
편집: DGM 2021년 5월 14일
Oh I see what's going on. The problem is that there are elements in your list that aren't in the dictionary. This complicates things. I ended up having to write a function to handle undefined cases:
LUT = {'banana',12;
'orange',23;
'apple',35;
'peach',74;
'pear',51;
'cherry',62};
% random assortment of fruit names
mydata = [LUT(randi(6,10,1),1); {'bldf'}]
% if you want the output to be a numeric vector
f = @(x) actualfunction(x,LUT);
output = cellfun(f,mydata)
function out = actualfunction(x,LUT)
m = strcmp(x,LUT(:,1));
if any(m)
out = LUT{m,2};
else
out = NaN; % or whatever default value you want
end
end
There may be more elegant ways of doing this.
If it turns out that these problem entries are to be discarded anyway, you could also do this instead.
% if you only want to keep valid entries
f = @(x) any(strcmp(x,LUT(:,1)));
mydata = mydata(cellfun(f,mydata)); % get rid of bad entries
f = @(x) LUT{strcmp(x,LUT(:,1)),2};
output = cellfun(f,mydata) % do lookup like first example
I doubt it's as fast or faster, but it's more compact.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Genomics and Next Generation Sequencing에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by