Read text file lines and analyze

Question

0 개 추천

I would appreciate help with reading and analyzing a text file. The text file (rosalind_gc1.txt) is in this format:

>Rosalind_4949

ACTTCTATGTAGCGCGCTATTTCAAGGGATCGGCCAATAGTACGACGTGTTTCATCTAGT GCGACAAATGTATATACCGTTTTCATTACGTACCACGATAAGTTGAAGCCCGTATTC AGACGCGGGAGCCGTCTGCTGGACAAGTACTAGCTGGTCCATCCTCCCCACCAAAGGGAA

>Rosalind_7490

AACTGGGAATTTCTATATTGGGCGGTAAGCTCGGGGCAATCTATTAGTTGAATGCAACAG TAACAAACTTGCCGTCGGTCGCTGTTCGCGCAGCATTAATAATAACTCTGGCGAGTAGAT

>Rosalind_8337

CCTTGTTGTCTACCCACCAAGTCAGATAGACAGTTGGCTGTCTCCAACGCAGATTTTCTA CGCTTCATGCTCTTGCGACTCATGTCGCCTGGGTTTATTGCTTCTCTACGGGATAACCGC CCGGGCTCACTCTACCCGCGGGAAGGCCGCCCTCTCTCCCGTGTGCCTACATAA

I would like to determine the %GC for the data sets between each “>Rosalind” heading. For example, in the example above there are 3 data sets. The %GC for the text between “>Rosalind_4949” and “>Rosalind_7490” is 48.5876% and between “>Rosalind_7490” and “>Rosalind_8337” is 45.000%.

I’m trying to use the following code but I don’t know how to read the lines as blocks between each “>” and I don’t know how to concatenate the lines as I read them. I would appreciate any help.

fid = fopen('rosalind_gc1.txt');
while ~feof(fid)
    templine = fgetl(fid);
    a = strcmp(templine, '>');
    if a == 0
        G = length(strfind(templine,'G'));
        C = length(strfind(templine,'C'));
        z = length(templine);
        %Per = (G+C)*100/z
    end
end
    Per = (G+C)*100/z

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Lmm3 2017년 9월 9일

MATLAB Online에서 열기

0 개 추천

The following code is what I used to read from the data file and determine %GC:

fid = fopen('rosalind_gc.txt');
n = 1;
G = 0;
C = 0;
z = 1;
while ~feof(fid)
    templine = fgetl(fid);
    a = strfind(templine, '>');
    TF = isempty(a);
    if TF == 1;
        n= n+1;
        G(1) = 0;
        C(1) = 0;
        z(1) = 0;
        G(n) = length(strfind(templine,'G'));
        C(n) = length(strfind(templine,'C'));
        z(n) = length(templine);
          G(n) = G(n) + G(n-1);
          C(n) = C(n) + C(n-1);
          z(n) = z(n) + z(n-1);
          continue
         % Per(n) = (G(n)+C(n))*100/z(n)
      else TF == 0 ;
          Per = (G(end)+C(end))*100/z(end)
          disp(templine)
          G(:,:) = [];
          C(:,:) = [];
          z (:,:)=[];
          continue
      end
  end
  Per =(G(end)+C(end))*100/z(end)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

KSSV 2017년 7월 24일

편집: KSSV 2017년 7월 24일

MATLAB Online에서 열기

0 개 추천

Let data.txt be your text file...You can count the number of G in your file as below:

fid = fopen('data.txt') ;
S = textscan(fid,'%s','delimiter','\n') ;
fclose(fid) ;
S = S{1} ;
N = 0 ;
for i = 1:length(S)
    N = N+length(strfind(S{i}, 'G'));
end

Without loop :

fid = fopen('data.txt') ;
  S = textscan(fid,'%s','delimiter','\n') ;
  fclose(fid) ;
  S = S{1} ;
Ni = strfind(S,'G') ;
N = sum(cellfun(@numel,Ni)) ;

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Lmm3 2017년 7월 25일

KSSV thank you for your response. Could you explain to me what the line S = S{1} is doing? The code returns the total number of "G" occurrences for the data file, but do you have a suggestion how to get the "G" occurrences between each of the headers that begin with ">Rosalind"? For example, in the data set above, I would like to get 3 values, the number of G occurrences between (“>Rosalind_4949” and “>Rosalind_7490”) between (“>Rosalind_7490” and “>Rosalind_8337”) and G occurrences below (">Rosalind_8337).

댓글을 달려면 로그인하십시오.

Answer 3

OCDER 2017년 9월 9일

MATLAB Online에서 열기

0 개 추천

readFasta.m

If you deal with a lot of fasta files, look into fastaread (Matlab Bioinformatics Toolbox) or readFasta (a code I made for another project).

Also, cellfun and regexp become pretty handy tools.

To get GC %:

[Header, Seq] = readFasta('Seq.txt');
PercGC = cellfun(@(S)length(regexpi(S, 'G|C'))/length(S)*100, Seq);
PercGC =
   48.5876
   45.0000
   55.1724

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Read text file lines and analyze

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

태그

Community Treasure Hunt

Read text file lines and analyze

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기