Removing certain lines in a text file by setting a restriction

I am trying to remove certain lines of text in a file by setting the restriction that the 15th column of the each line can only go up to the numerical value of '21'.
For example:
13|PGC000013|0.00370|33.13420|~|15.41|0.675|0.217|0.587|~|0.87|0.102|~|-18.94|72.722|10.908|0.40|0.41|
has a value of '72.722', which is more than the '21' cutoff threshold so it would be eliminated.
My file is attached.

 채택된 답변

Star Strider
Star Strider 2015년 6월 17일
편집: Star Strider 2015년 6월 17일
This works, and it’s relatively fast:
fidi = fopen('jgillis16 copiedlines.txt','rt');
Glxc = textscan(fidi, '%s', 'HeaderLines',1, 'Delimiter','|');
frewind(fidi)
Glxcs = textscan(fidi, '%s', 'EndOfLine','\r\n');
fclose(fidi);
dlen = 18*fix(length(Glxc{:})/18); % Set Row Length
Glxcr = reshape(Glxc{:}(1:dlen), 18, [])'; % Reshape & Transpose
Idx = cellfun(@(x) str2num(x) <= 21, Glxcr(:,15), 'Uni',0); % Find Rows With Col18 <= 21 To Retain & Write To New File
LIdx = logical(cell2mat(Idx)); % Logical Array From Cell
NewGlxc = Glxcs{:}(LIdx,:); % Rows Of New Array
You would then write the ‘NewGlxc’ array to your file. (I would save it as a .mat file if it is only for MATLAB use.)
EDIT — To write it to a text file:
fido = fopen('NewGalaxy.txt','wt');
fprintf(fido, '%s\n', NewGlxc{:});
fclose(fido);

댓글 수: 8

Whoops accidentally accepted! What I meant to do was to tell you that although it creates NewGalaxy.txt, it doesnt include galaxies < 21.
For example: the first line is 380|PGC000380|0.09110|46.54364|~|14.95|0.563|0.181|~|~|~|~|~|-19.39|71.903|10.785|0.33|0.34|
Which is 71.903.
File is attached.
I tried changing the 15 around in Glxcr(:,15) from
Idx = cellfun(@(x) str2num(x) <= 21, Glxcr(:,15), 'Uni',0); % Find Rows With Col18 <= 21 To Retain & Write To New File
But, it's still not targeting the specified column with the <21 distances. I don't know why but it seems like it is targeting column 16 instead of 15 with the provided code.
I found the problem.
I was using the textscan line from your previous Question for the file that had one header line. (I was using it to develop my code prior to your posting your file here. When you posted it, I changed the file name but not the header line information.) That threw off the subscript references by one between ‘Glxc’ and ‘Glxcs’. Change the ‘Glxc’ assignment to:
Glxc = textscan(fidi, '%s', 'Delimiter','|');
and it works as it should. I was concerned, because the check variable (not posted) that I used to view ‘Glxcr’ was correct. Both ‘NewGlxc’ and ‘NewGalaxy’ are correct now when I looked at them.
I don't know what the problem is, but it is still not working in terms of only accepting the lines that have the 15th column x<21 in values. All my distances instead seem to be x>21.
Could you attach the file you are producing?
Sure. To be certain we’re running the same code, this produced the attached file:
fidi = fopen('jgillis16 copiedlines.txt','rt');
Glxc = textscan(fidi, '%s', 'Delimiter','|');
frewind(fidi)
Glxcs = textscan(fidi, '%s', 'EndOfLine','\r\n');
fclose(fidi);
dlen = 18*fix(length(Glxc{:})/18); % Set Row Length
Glxcr = reshape(Glxc{:}(1:dlen), 18, [])'; % Reshape & Transpose
Idx = cellfun(@(x) str2num(x) < 21, Glxcr(:,15), 'Uni',0); % Find Rows With Col15 < 21
LIdx = logical(cell2mat(Idx)); % Logical Array From Cell
NewGlxc = Glxcs{:}(LIdx,:); % Rows Of New Array
fido = fopen('NewGalaxy.txt','wt')
fprintf(fido, '%s\n', NewGlxc{:});
fclose(fido)
I opened and checked the file in ‘Notepad’. When I counted through and randomly checked ‘Column 15’ in various lines (rows), all the values were <21. (I have an additional path detection and creation line that I incorporate into the fopen call that I do not include here, but that doesn’t affect the file contents.)
Apologise for the delay. It’s GMT-6 here, so my morning is just now starting.
I had to reload my .txt file as for some reason it wasn't showing an updated version. But, yes it works. Thanks for your continuous help!
As always, my pleasure!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Azzi Abdelmalek
Azzi Abdelmalek 2015년 6월 17일
fid=fopen('fic.txt')
l=fgetl(fid);
k=1;
while ischar(l)
r{k}=l;
k=k+1
l=fgetl(fid);
end
kk=0
for k=1:numel(r)
a=str2double(regexp(r{k},'-?\d+(\.\d+)?','match'));
if a(5)<21
kk=kk+1;
out{kk}=r{k};
k=k+1;
end
end

댓글 수: 9

Receive an error:
fid =
3
Cell contents assignment to a non-cell array object.
Error in copiedlinestwentympc (line 5)
r{k}=l;
I tested your attached file, and there is no errors. try to clear your variables
clear
Ok. It works, but does this produce a new file with only the lines of code that are <21? If not, how would I transfer this information into a new text file?
Unfortunately, I transferred the result to a text file and it does not produce what I was looking for. It runs, but produces the following numbers:
4715229 084 13.57178 -34.07085 49.667 10.927
which do not mean anything in this case and was not what I was looking for.
fid=fopen('fic.txt');
fid1=fopen('fic1.txt','w');
l=fgetl(fid);
k=1;
while ischar(l)
r{k}=l;
a=str2double(regexp(r{k},'-?\d+(\.\d+)?','match'));
if a(5)<21
kk=kk+1;
out{kk}=r{k};
fprintf(fid1,'%s\r\n',out{kk});
end
l=fgetl(fid);
k=k+1;
end
kk=0
fclose(fid)
fclose(fid1)
This again does not do anything. Just reiterates the values inputted.
This is not correct, in your previous comment you said it works and asked how to save in a new text file, that's what this code do
Overall, the code does not achieve what I want to do. It doesn't transpose lines that have a value <21 in the 15th column in another file.
No, they are saved in the file 'fic1.txt'

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Creating, Deleting, and Querying Graphics Objects에 대해 자세히 알아보기

태그

질문:

2015년 6월 17일

댓글:

2015년 6월 18일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by