FULL amino acid composition codes+ datasets+ User manual+ (feature extruction + classification(arff.))
fastest way of amino acid composition feature extruction using matab? my codes are working fine but need a simplify it further
조회 수: 1 (최근 30일)
이전 댓글 표시
% TRAIN TG dataset feature extruction
%% Import the data [~, ~, raw0_0] = xlsread('C:\Users\Amindra\Desktop\EE361\TG dataset-20170922\Train1,taguchi.xlsx','Sheet1','A1:A978'); [~, ~, raw0_1] = xlsread('C:\Users\Amindra\Desktop\EE361\TG dataset-20170922\Train1,taguchi.xlsx','Sheet1','D1:D978'); raw = [raw0_0,raw0_1]; raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''}; cellVectors = raw(:,[1,2]);
%% Create table Train1taguchi = table;
%% Allocate imported array to column variable names FOLDS = cellVectors(:,1); sequence = cellVectors(:,2);
fprintf('@RELATION TESTtg\n'); fprintf('@ATTRIBUTE one NUMERIC\n'); fprintf('@ATTRIBUTE two NUMERIC\n'); fprintf('@ATTRIBUTE three NUMERIC\n'); fprintf('@ATTRIBUTE four NUMERIC\n'); fprintf('@ATTRIBUTE five NUMERIC\n'); fprintf('@ATTRIBUTE six NUMERIC\n'); fprintf('@ATTRIBUTE seven NUMERIC\n'); fprintf('@ATTRIBUTE eight NUMERIC\n'); fprintf('@ATTRIBUTE nine NUMERIC\n'); fprintf('@ATTRIBUTE ten NUMERIC\n'); fprintf('@ATTRIBUTE eleven NUMERIC\n'); fprintf('@ATTRIBUTE twelve NUMERIC\n'); fprintf('@ATTRIBUTE thirteen NUMERIC\n'); fprintf('@ATTRIBUTE fourteen NUMERIC\n'); fprintf('@ATTRIBUTE fifteen NUMERIC\n'); fprintf('@ATTRIBUTE sixteen NUMERIC\n'); fprintf('@ATTRIBUTE seventeen NUMERIC\n'); fprintf('@ATTRIBUTE eighteen NUMERIC\n'); fprintf('@ATTRIBUTE nineteen NUMERIC\n'); fprintf('@ATTRIBUTE twenty NUMERIC\n'); fprintf('@ATTRIBUTE class {fold1,fold2,fold3,fold4,fold5,fold6,fold7,fold8,fold9,fold10,fold11,fold12,fold13,fold14,fold15,fold16,fold17,fold18,fold19,fold20,fold21,fold22,fold23,fold24,fold25,fold26,fold27,fold28,fold29,fold30}\n'); fprintf('@DATA\n');
for i=1:978 %in this case it is 978 protein sequence FOLDS = cellVectors(i,1); fold=char(FOLDS); sequence = cellVectors(i,2); % call each row of the table seq=char(sequence);% NOTE convert each row of the table to each CHAR %AA=aa2int(seq) AA = aacount(seq); % % count the ALL the # of AA(AMINO ACID)'s in the protein sequence A=AA.A;% count specifically the # of A R=AA.R;% count specifically the # of R N=AA.N;% count specifically the # of N D=AA.D;% count specifically the # of D C=AA.C;% count specifically the # of C Q=AA.Q;% count specifically the # of Q E=AA.E;% count specifically the # of E G=AA.G;% count specifically the # of G H=AA.H;% count specifically the # of H I=AA.I;% count specifically the # of I L=AA.L;% count specifically the # of L's in the protein sequence K=AA.K;% count specifically the # of K's in the protein sequence M=AA.M;% count specifically the # of M's in the protein sequence F=AA.F;% count specifically the # of F's in the protein sequence P=AA.P;% count specifically the # of P's in the protein sequence S=AA.S;% count specifically the # of S's in the protein sequence T=AA.T;% count specifically the # of T's in the protein sequence W=AA.W;% count specifically the # of W's in the protein sequence Y=AA.Y;% count specifically the # of Y's in the protein sequence V=AA.V;% countspecifically the # of V's in the protein sequence lenght = (A+R+N+D+C+Q+E+G+H+I+L+K+M+F+P+S+T+W+Y+V);% length of the protein sequence %fprintf('\nlenght of PROTEIN SEQUENCE = %d\n',lenght) % disply to USER the length of protein sequence
%% FEATURE EXTRACTION f1=(A/lenght); %fprintf('feature A = %d\n',f1) % feature for amino acid A SHIFTED 2 DECIMAL PLACE f2=(R/lenght); %fprintf('feature I = %d\n',f2)% feature for amino acid I SHIFTED 2 DECIMAL PLACE f3=(N/lenght); %fprintf('feature L = %d\n',f3)% feature for amino acid L SHIFTED 2 DECIMAL PLACE f4=(D/lenght); %fprintf('feature M = %d\n',f4)% feature for amino acid M SHIFTED 2 DECIMAL PLACE f5=(C/lenght); %fprintf('feature F = %d\n',f5)% feature for amino acid F SHIFTED 2 DECIMAL PLACE f6=(Q/lenght); %fprintf('feature V = %d\n',f6)% feature for amino acid V SHIFTED 2 DECIMAL PLACE f7=(E/lenght); %fprintf('feature P = %d\n',f7)% feature for amino acid P SHIFTED 2 DECIMAL PLACE f8=(G/lenght); %fprintf('feature G = %d\n',f8)% feature for amino acid G SHIFTED 2 DECIMAL PLACE K+M+F+P+S+T+W+Y+V f9=(H/lenght); %fprintf('feature R = %d\n',f9)% feature for amino acid R SHIFTED 2 DECIMAL PLACE f10=(I/lenght); %fprintf('feature K = %d\n',f10)% feature for amino acid K SHIFTED 2 DECIMAL PLACE f11=(L/lenght); %fprintf('feature D = %d\n',f11)% feature for amino acid D SHIFTED 2 DECIMAL PLACE f12=(K/lenght); %fprintf('feature E = %d\n',f12)% feature for amino acid E SHIFTED 2 DECIMAL PLACE f13=(M/lenght); %fprintf('feature Q = %d\n',f13)% feature for amino acid Q SHIFTED 2 DECIMAL PLACE f14=(F/lenght); %fprintf('feature N = %d\n',f14)% feature for amino acid N SHIFTED 2 DECIMAL PLACE f15=(P/lenght); %fprintf('feature H = %d\n',f15)% feature for amino acid H SHIFTED 2 DECIMAL PLACE f16=(S/lenght); %fprintf('feature S = %d\n',f16)% feature for amino acid S SHIFTED 2 DECIMAL PLACE f17=(T/lenght); %fprintf('feature T = %d\n',f17)% feature for amino acid T SHIFTED 2 DECIMAL PLACE f18=(W/lenght); %fprintf('feature Y = %d\n',f18)% feature for amino acid Y SHIFTED 2 DECIMAL PLACE f19=(Y/lenght); %fprintf('feature C = %d\n',f19)% feature for amino acid C SHIFTED 2 DECIMAL PLACE f20=(V/lenght);
fprintf('%f,',f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20) fprintf('%s',fold) fprintf('\n%d') end
댓글 수: 3
답변 (1개)
Luuk van Oosten
2018년 12월 20일
Although your question is poorly formulated, and I agree with Image Analyst about the formatting of your code....
Here is my answer to "fastest way of amino acid composition", as it might help someone else as well:
for example, lets assume your protein sequence is the following:
yoursequence = 'YURPRTEINSEQENCEYUCANPUTHERE'
you can use
compositionstruct = aacount(yoursequence)
Which will then return you the amino acid composition of your protein sequence in the struct
compositionstruct
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Weather and Atmospheric Science에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!