Traversing Text Document Matlab

조회 수: 2 (최근 30일)
xRobot
xRobot 2019년 11월 17일
편집: Adam Danz 2019년 11월 19일
Please provide guidance on this particular inquiry. All responses are highly valued and will be used to further knowledge(not just looking for a copy and paste solution). I am attempting to read a Microsoft Word dictionary into Matlab. From here I would like to be able to traverse it and extract words of a specific length, say four letter words, and put them into an array. Then I would like to select random words from the array and put them into a matrix. ?

답변 (1개)

Adam Danz
Adam Danz 2019년 11월 17일
편집: Adam Danz 2019년 11월 17일
Reading from word doc
Here's the general approach to reading a Microsoft word document.
directory = 'C:\Users\AOC\Documents\MATLAB';
file = 'myDocFile.docx';
% Full path to the MS Word file
filePath = fullfile(directory,file);
% Read MS Word file using actxserver function
word = actxserver('Word.Application');
wdoc = word.Documents.Open(filePath);
txt = wdoc.Content.Text;
Quit(word)
delete(word)
The variable txt is a char array containing the text in your document.
Extracting 4-letter words
There are several approaches you could use. This one is fast and doesn't require segementing each word and counting each word-length. Instead, it uses a regular expression to search for this pattern:
[non-letter],[4-letters],[non-letter]
It also uses strtrim() to remove the leading and trailing white space.
% Extract 4-letter words.
s = strtrim(regexp(txt, '([^a-zA-Z])[a-zA-Z]{4}([^a-zA-Z])', 'match'));
s is a 1xn cell array of 4-letter words at character arrays.
Randomly select words
You can't put non-numeric values into a matrix but you can put them into a cell array. This example below chooses n random values from the extracted words.
n = 10;
if n > numel(s)
error('There are only %d words available. You selected %d words.' numel(s), n)
end
randIdx = randi(numel(s),1,n);
randWords = s(randIDx); % Here is your random selection
  댓글 수: 5
xRobot
xRobot 2019년 11월 19일
fileID = fopen('mylist.odt','r');
formatSpec = '%s';
words = fscanf(fileID,formatSpec);
I have used the above code to read in the file. It read in as a 1x11102 char. What I would like to do is convert this to a string array.
Adam Danz
Adam Danz 2019년 11월 19일
편집: Adam Danz 2019년 11월 19일

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by