allwords

버전 1.1.0.0 (2.8 KB) 작성자: John D'Errico
Parse a sentence or any string into distinct "words"
다운로드 수: 1.7K
업데이트 날짜: 2010/4/7

라이선스 보기

Sentence parsing can be done one word at a time using strtok. However, sometimes it is useful to (efficiently) extract all words into a cell array in one function call. The function allwords.m does exactly this.

Spaces, white space (tabs), carriage returns, and punctuation characters are all valid separator characters by default. In this example, I had a period at the end, as well as multiple spaces between some words.

str = 'The quick brown fox jumped over the lazy dog.';
words = allwords(str)
words =
'The' 'quick' 'brown' 'fox' 'jumped' 'over' 'the' 'lazy' 'dog'

This utility can also work on any integer vector. The default separators for numeric vectors are [-inf inf NaN], but you can assign any separators you desire. Here, parse a string of integers, with only NaN elements as the separator.

str = [1 2 4 2 inf 3 3 5 nan 4 6 5];
words = allwords(str,nan);
words{1}
ans =
1 2 4 2 Inf 3 3 5

words{2}
ans =
4 6 5

Finally, allwords is efficient. For example, on a random numeric string of length 1e6, allwords parses it into over 90000 distinct "words" in less than 0.5 seconds.

str = round(rand(1,1000000)*10);
tic
words = allwords(str,[0 10]);
toc
Elapsed time is 0.455194 seconds.

There were over 90000 different words that were extracted

numel(words)
ans =
90310

The longest word had length 104.

max(cellfun(@numel,words))
ans =
104

인용 양식

John D'Errico (2024). allwords (https://www.mathworks.com/matlabcentral/fileexchange/27184-allwords), MATLAB Central File Exchange. 검색 날짜: .

MATLAB 릴리스 호환 정보
개발 환경: R2010a
모든 릴리스와 호환
플랫폼 호환성
Windows macOS Linux
카테고리
Help CenterMATLAB Answers에서 String Parsing에 대해 자세히 알아보기
도움

도움 받은 파일: wordcount

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
버전 게시됨 릴리스 정보
1.1.0.0

Speed enhancement for character strings, plus I added a reference to the wordcount function.