필터 지우기
필터 지우기

getting the nth term out of a sequence

조회 수: 2 (최근 30일)
SANGBIN LEE
SANGBIN LEE 2024년 2월 29일
편집: John D'Errico 2024년 2월 29일
% Define the input and output file names
inputFileName = 'KIF11.txt';
outputFileName = 'CDS.txt';
% Read the sequence from the input file
fid = fopen(inputFileName, 'r');
sequence = fscanf(fid, '%c');
fclose(fid);
% Define the start and end positions of the CDS
cdsStart = 155;
cdsEnd = 3358;
% Extract the CDS from the sequence
cdsSequence = sequence(cdsStart:cdsEnd);
% Write the CDS sequence to a new file
fid = fopen(outputFileName, 'w');
fprintf(fid, '%s', cdsSequence);
fclose(fid);
I have the code above which is supposed to pull out the 155th term to the 3358th term in the text file that I have. For some reason when I run the code, it shows me the 153rd term to the 3356th term. Is something wrong with the code?
  댓글 수: 3
SANGBIN LEE
SANGBIN LEE 2024년 2월 29일
thank you
Walter Roberson
Walter Roberson 2024년 2월 29일
sequence = fscanf(fid, '%c');
beware: the character codes returned in sequence will include any end-of-line characters that might be there (possibly carriage return and line feed). Linear indexing into that is a bit uncertain because of the uncertainty over whether carriage returns are present or not.

댓글을 달려면 로그인하십시오.

답변 (1개)

Dyuman Joshi
Dyuman Joshi 2024년 2월 29일
편집: Dyuman Joshi 2024년 2월 29일
As @Walter has warned, a carriage return character (\r) is being read along with the data -
% Define the input and output file names
inputFileName = 'KIF11.txt';
outputFileName = 'CDS.txt';
% Read the sequence from the input file
fid = fopen(inputFileName, 'r');
sequence = fscanf(fid, '%c');
fclose(fid);
size(sequence)
ans = 1×2
1 3736
%Expected - last character of the 1st line and first character of the 2nd line
%Output is not according to that
y = sequence(70:71)
y =
'T '
double(y)
ans = 1×2
84 13
Alternatively, you can use textscan here -
Fid = fopen(inputFileName, 'r');
out = textscan(Fid, '%c')
out = 1×1 cell array
{3682×1 char}
seq = out{1};
y = seq(70:71)
y = 2×1 char array
'T' 'G'
% Define the start and end positions of the CDS
cdsStart = 155;
cdsEnd = 3358;
% Extract the CDS from the sequence
cdsSequence = sequence(cdsStart:cdsEnd);
% Write the CDS sequence to a new file
fid = fopen(outputFileName, 'w');
fprintf(fid, '%s', cdsSequence);
fclose(fid);
  댓글 수: 1
John D'Errico
John D'Errico 2024년 2월 29일
편집: John D'Errico 2024년 2월 29일
+1. I was going to point this out:
find(~ismember(sequence,'CAGT'))
ans =
Columns 1 through 8
71 142 213 284 355 426 497 568
Columns 9 through 16
639 710 781 852 923 994 1065 1136
Columns 17 through 24
1207 1278 1349 1420 1491 1562 1633 1704
Columns 25 through 32
1775 1846 1917 1988 2059 2130 2201 2272
Columns 33 through 40
2343 2414 2485 2556 2627 2698 2769 2840
Columns 41 through 48
2911 2982 3053 3124 3195 3266 3337 3408
Columns 49 through 54
3479 3550 3621 3692 3735 3736
So there are two invisible characters in there before 155. They fell where carriage return characters will lie. That explains why it looks like the sequence was read by exactly 2 characters off.
So by deleting those elements first, then an index into the repaired string would work.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

태그

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by