Remove rows in an array containing a non-matching element

Question

0 개 추천

I have a datafile data.txt:

gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138

I have another data file genelist.txt that lists just genes I'm interested in for my study:

gene45
gene59
gene61

I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:

gene45 489 101 378
gene59 89 827 138

How do I go about doing this?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Guillaume 2017년 4월 11일

MATLAB Online에서 열기

2 개 추천

Probably the easiest:

geneswithdata = readtable('data.txt');  %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes';  %rename first column for clarity (optional). 
%I would also rename all the other columns
genesonly = readtable('genelist.txt');  %load as a table
genesonly.Properties.VariableNames = {'genes'};  %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);

Done.

Using ismember that last line could be done as:

found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);

Using intersect (rather than setdiff) it could be done as:

[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Guillaume 2017년 4월 12일

MATLAB Online에서 열기

By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:

readtable(___, 'ReadVariableNames', false)

readtable is extremely flexible. Look at its documentation to see all the options available.

astein 2017년 4월 16일

Thank you very much for the help!

댓글을 달려면 로그인하십시오.

Answer 2

Image Analyst 2017년 4월 11일

0 개 추천

Look into ismember() or setdiff()

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

astein 2017년 4월 11일

편집: astein 2017년 4월 11일

I don't know how to use either for this purpose. setdiff() is going to give me the genes they don't have in common? I want the genes they have in common. ismember() gives me a logical array. I run into the same issue of how do I use the array to pull out only the rows that are "true". I am having difficulty manipulating the datasets (which format to load the txt files--structure, table, etc).

댓글을 달려면 로그인하십시오.

Remove rows in an array containing a non-matching element

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

카테고리

태그

Community Treasure Hunt

Remove rows in an array containing a non-matching element

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기