Remove rows in an array containing a non-matching element
조회 수: 2 (최근 30일)
이전 댓글 표시
I have a datafile data.txt:
gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138
I have another data file genelist.txt that lists just genes I'm interested in for my study:
gene45
gene59
gene61
I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:
gene45 489 101 378
gene59 89 827 138
How do I go about doing this?
댓글 수: 0
채택된 답변
Guillaume
2017년 4월 11일
Probably the easiest:
geneswithdata = readtable('data.txt'); %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes'; %rename first column for clarity (optional).
%I would also rename all the other columns
genesonly = readtable('genelist.txt'); %load as a table
genesonly.Properties.VariableNames = {'genes'}; %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);
Done.
Using ismember that last line could be done as:
found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);
Using intersect (rather than setdiff) it could be done as:
[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);
댓글 수: 3
Guillaume
2017년 4월 12일
By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:
readtable(___, 'ReadVariableNames', false)
추가 답변 (1개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Tables에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!