large textfile 27580*1102 cell

조회 수: 1 (최근 30일)
chocho
chocho 2017년 2월 20일
댓글: Rik 2017년 2월 21일
fid = fopen('Cancer.txt','r');
data={};
while ~feof(fid)
l=fgetl(fid); %get the lines
if isempty(strfind(l,'NA')), %remove NA rows
else
continue
end
%read next line
idx=regexp(l,'\t','split'); %split the colmuns of this line which don't have NA and look for ';' in every column and split it
[nrow,ncol]=size(idx);
for i=1:ncol
if idx(i)==';' %look for columns which have ';'and split it
split this column into 2 columns and put the second column
into a new row
idx = regexp(idx,';','split')
l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
end
i=i+1;
end
fprintf(fid,l,idx);
end
fid=fclose(fid);
inputs:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05
Composite Element REF Beta_value Gene_Symbol Chromosome Genomic_Coordinate Beta_value Gene_Symbol
cg00000292 0.511852232819811 ATP2A1 16 28890100 0.787687855895422 ATP2A1
cg00003994 0.0341977140819682 MEOX2 15725862 0.334815614333325 MEOX2
cg00008493 0.987979722052904 "COX8C;KIAA1409" 14 93813777 0.986128428295584 "COX8C;KIAA1409"
cg00011459 0.922491239231445 "TMEM186;PMM2" 16 8890425 0.961124285303233 "TMEM186;PMM2"
output:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 ……
cg00000292 0.511852232819811 ATP2A1 0.787687855895422
cg00003994 0.0341977140819682 MEOX2 0.334815614333325
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
  댓글 수: 4
chocho
chocho 2017년 2월 21일
편집: Walter Roberson 2017년 2월 21일
textfile have these informations:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05
Composite Element REF Beta_value Gene_Symbol Chromosome Genomic_Coordinate Beta_value Gene_Symbol
cg00000292 0.511852232819811 ATP2A1 16 28890100 0.787687855895422 ATP2A1
cg00003994 0.0341977140819682 MEOX2 15725862 0.334815614333325 MEOX2
cg00008493 0.987979722052904 "COX8C;KIAA1409" 14 93813777 0.986128428295584 "COX8C;KIAA1409"
cg00011459 0.922491239231445 "TMEM186;PMM2" 16 8890425 0.961124285303233 "TMEM186;PMM2"
.......................................................................
i want to get this output :
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 ……
cg00000292 0.511852232819811 ATP2A1 0.787687855895422
cg00003994 0.0341977140819682 MEOX2 0.334815614333325
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
chocho
chocho 2017년 2월 21일
Appreciate your help!

댓글을 달려면 로그인하십시오.

채택된 답변

Rik
Rik 2017년 2월 21일
So essentially you have a tab separated file, where you only want to keep specific columns.
You can read a file like this with readtable. If you really have to go through it line-by-line you can use a for loop, but with this syntax you should be able to select the columns you want to keep. (and with writetable you can write the new file)
Note1: You can set the 'Delimiter' parameter to a tab with '\t'.
Note2: You'll need Matlab 2013b or later. Otherwise you'll have to muck about with the textscan function.
  댓글 수: 5
chocho
chocho 2017년 2월 21일
yeah i want like this then from those 6:4:end , i want to calculate the average between them because all of them are of type float
so plz could you help me to do that it seems for me to hard to do it .
i really appreciate your help
Rik
Rik 2017년 2월 21일
If you have managed to convert your data to a matrix, then you can use the command mean(data,2) to get the average along the 2nd dimension (so the columns)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Import and Export에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by