Remove rows in an array containing a non-matching element

조회 수: 2 (최근 30일)
astein
astein 2017년 4월 11일
댓글: astein 2017년 4월 16일
I have a datafile data.txt:
gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138
I have another data file genelist.txt that lists just genes I'm interested in for my study:
gene45
gene59
gene61
I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:
gene45 489 101 378
gene59 89 827 138
How do I go about doing this?

채택된 답변

Guillaume
Guillaume 2017년 4월 11일
Probably the easiest:
geneswithdata = readtable('data.txt'); %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes'; %rename first column for clarity (optional).
%I would also rename all the other columns
genesonly = readtable('genelist.txt'); %load as a table
genesonly.Properties.VariableNames = {'genes'}; %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);
Done.
Using ismember that last line could be done as:
found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);
Using intersect (rather than setdiff) it could be done as:
[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);
  댓글 수: 3
Guillaume
Guillaume 2017년 4월 12일
By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:
readtable(___, 'ReadVariableNames', false)
readtable is extremely flexible. Look at its documentation to see all the options available.
astein
astein 2017년 4월 16일
Thank you very much for the help!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Image Analyst
Image Analyst 2017년 4월 11일
Look into ismember() or setdiff()
  댓글 수: 1
astein
astein 2017년 4월 11일
편집: astein 2017년 4월 11일
I don't know how to use either for this purpose. setdiff() is going to give me the genes they don't have in common? I want the genes they have in common. ismember() gives me a logical array. I run into the same issue of how do I use the array to pull out only the rows that are "true". I am having difficulty manipulating the datasets (which format to load the txt files--structure, table, etc).

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Tables에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by