hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

조회 수: 1 (최근 30일)
d = fopen('COADREAD_methylation.txt','r');
this_line=0;
all={};
while this_line~=-1
% C= textscan( d, '%f%s' ) ;
this_line=fgetl(d);
if this_line~=-1
all=[all;this_line];
end
end
fclose(d);

채택된 답변

dpb
dpb 2017년 2월 15일
편집: dpb 2017년 2월 16일
Well, 'NA' is easy, not sure what defines the repeated columns; not enough time at present to try to parse that input file to figure out what is/isn't unique without a description being supplied...
fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
l=fgetl(fid);
if isempty(strfind(l,'NA')), data=[data;{l}]; end
end
fid=fclose(fid);
If the presence of 'NA' is all that's needed to get all the offending records, then you're done; otherwise need more details on how to tell so folks here don't have to try to work it out on their own.
  댓글 수: 13
chocho
chocho 2017년 2월 20일
편집: Walter Roberson 2017년 2월 20일
hi friend, i want to make this code like this format
Note: i want to get every line and check if it has a NA remove it and get the second line, if not ckeck the columns of this line and see which column have ';' split this column and make 2 rows
fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
l=fgetl(fid); %get the lines
if isempty(strfind(l,'NA')), %remove NA rows
else
%read next line
idx=regexp(l,'\t','split'); %split the colmuns of this line which don't have NA and look for ';' in every column and split it
[nrow,ncol]=size(idx);
for i=1:ncol
if idx(i)==';' %look for columns which have ';'and split it
split this column into 2 columns and put the second column
into a new row
%D = regexp(idx,';','split')
%l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
end
i=i+1;
end
save this line % this line will have no NA and if have ; will be splitted
end
end
fid=fclose(fid);
chocho
chocho 2017년 2월 20일
편집: Walter Roberson 2017년 2월 20일
inputs:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05
Composite Element REF Beta_value Gene_Symbol Chromosome Genomic_Coordinate Beta_value Gene_Symbol
cg00000292 0.511852232819811 ATP2A1 16 28890100 0.787687855895422 ATP2A1
cg00002426 0.519102187746053 SLMAP 3 57743543 0.932889308560864 SLMAP
cg00006414 NA "ZNF425;ZNF398" 7 148822837 NA "ZNF425;ZNF398"
cg00008493 0.987979722052904 "COX8C;KIAA1409" 14 93813777 0.986128428295584 "COX8C;KIAA1409"
cg00011459 0.922491239231445 "TMEM186;PMM2" 16 8890425 0.961124285303233 "TMEM186;PMM2"
outputs:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05
cg00000292 0.511852232819811 ATP2A1 0.787687855895422
cg00002426 0.519102187746053 SLMAP 0.932889308560864
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
cg00011459 0.922491239231445 TMEM186 0.961124285303233
cg00011459 0.922491239231445 PMM2 0.961124285303233
appreciate your help !

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

태그

아직 태그를 입력하지 않았습니다.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by