With extractHTMLtext i have harvested a news article. How can I write paragraph-long blocks to a text file
조회 수: 1 (최근 30일)
이전 댓글 표시
The text analysis funcction created a clean, ASCII file out of a very complext newspaper article using the following code (which worked wel!):
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
Each paragraph became a line of text. How can I write these to an ascii file for import to a text processing program? One paragraph per line of output file (txt or xlsx) would be best.
댓글 수: 0
답변 (1개)
Vatsal
2024년 2월 21일
Hi,
To output the extracted text to an ASCII file, formatting each paragraph as a separate line, the text must first be divided into paragraphs. This can be achieved in MATLAB by utilizing the "split" function, which divides a string into a cell array of strings using designated delimiters.
Here is the modified code to write each paragraph to a text file:
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
str_split = split(str, '\n'); % Split the string into paragraphs
fileID = fopen('output.txt','w'); % Open a file named 'output.txt'. Change it as per your requirement.
for i = 1:numel(str_split)
fprintf(fileID,'%s\n',str_split{i}); % Write each paragraph on a new line
end
fclose(fileID); % Don't forget to close the file after you're done
I hope this helps!
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Environment and Settings에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!