Function to use Matlab BERT tokenizer in parallel
This function simply divides your text into batches, and tokenizes in parallel. As the Matlab tokenizer is very slow when run on a single processor for large data, this provides a significant speed-up. On an i7-10875H laptop with 8 logical units, tokenizing 76k sentences takes about 100 seconds.
Also note that providing the Matlab BERT model is important, as different BERT models use different encodings for the special BERT tokens like [SEP] etc.
인용 양식
Ralf Elsas (2025). fastBERTtokens: Tokenizing for BERT in parallel (https://kr.mathworks.com/matlabcentral/fileexchange/125295-fastberttokens-tokenizing-for-bert-in-parallel), MATLAB Central File Exchange. 검색 날짜: .
MATLAB 릴리스 호환 정보
개발 환경:
R2022b
R2021a 이상 릴리스와 호환
플랫폼 호환성
Windows macOS Linux태그
도움
도움 받은 파일: Transformer Models
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!버전 | 게시됨 | 릴리스 정보 | |
---|---|---|---|
1.0.0 |