text analytics toolbox and MeCab

I would like to add some words into MeCab dictionary that I suppose it is used behind Matlab textanalytics toolbox.
The tokenized procedure makes some words too short.
If you have any idea to solve my problem, it will be appricated.

답변 (1개)

Christopher Creutzig
Christopher Creutzig 2020년 3월 9일

0 개 추천

Text Analytics Toolbox does not ship the tooling to compile an extended MeCab dictionary. But if you have one for your field (I know there are such compiled dictionaries for medical purposes, for example), you can use mecabOptions to have tokenizedDocument use it.
Alternatively, if you only have a handful of words you want to preserve, and are not worried about inflections, you can use "CustomTokens" to pass them to the tokenizer:
tokenizedDocument("日本睡眠学会のガイドライン")
ans =
tokenizedDocument:
5 tokens: 日本 睡眠 学会 の ガイドライン
tokenizedDocument("日本睡眠学会のガイドライン","CustomTokens","日本睡眠学会")
ans =
tokenizedDocument:
3 tokens: 日本睡眠学会 の ガイドライン

댓글 수: 1

Shuichi Obuchi
Shuichi Obuchi 2020년 3월 10일
Thank you for your reply. I already solved the problem by using UserModel option. Anyway I am very happy to have your information.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Workspace Variables and MAT Files에 대해 자세히 알아보기

제품

릴리스

R2019b

질문:

2019년 12월 25일

댓글:

2020년 3월 10일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by