tf weighting in docs

조회 수: 1 (최근 30일)
John
John 2017년 12월 2일
How do I evaluate term frequency (how many times each term occurs in a document) from a notepad having multiple documents, started by a document ID <P ID=xxx> and separated by delimiters </P>. I need to distinguish the statistics for each document.
I have been able to load the text, but my regular approach of identifying document ID won't work because the IDs are not contiguous, and as such, 'n' cannot be used to increment doc ID.
% The notepad file has been loaded into variable C
C = C{1};
fclose(fid);
idx = strfind(C,'</P>');
n = nnz(cellfun(@(x) ~isempty(x), idx));
fileName = ('DTags.txt');
fid = fopen(fileName,'w+');
for kk = 1:n
str = ['<p id=',num2str(kk),'>'];
fileName = ('DTags.txt');
fid = fopen(fileName,'a+');
fprintf(fid,'%s\r\n',str);
fclose(fid);
end

답변 (0개)

카테고리

Help CenterFile Exchange에서 Text Analytics Toolbox에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by