Text Analytics Toolbox seems making lots of mistakes on recognizing language and PartOfSpeech
조회 수: 2 (최근 30일)
이전 댓글 표시
Hi,
My input is a list of VERY BASIC ENGLISH words shown below. I would like to find out the part of speech of them.
kid
killer
kind
king
kiss
kitchen
knee
knife
knowledge
words = {'kid','killer','kind','king','kiss','kitchen','knee','knife','knowledge'};
words = string(words);
documents = tokenizedDocument(words);
documents = addPartOfSpeechDetails(documents);
tdetails = tokenDetails(documents);
And this is where the mistakes are when I check the 'tdetails' (see below).
Why Matlab thinks these words are german (should be 'en' for 'english') and adjectives (most of them should be nouns)?
tdetails =
9×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
___________ ______________ ______________ __________ _______ ________ ____________
"kid" 1 1 1 letters de adjective
"killer" 2 1 1 letters de adjective
"kind" 3 1 1 letters de adjective
"king" 4 1 1 letters de adjective
"kiss" 5 1 1 letters de adjective
"kitchen" 6 1 1 letters de adjective
"knee" 7 1 1 letters de adjective
"knife" 8 1 1 letters de adjective
"knowledge" 9 1 1 letters de adjective
댓글 수: 0
답변 (1개)
Christopher Creutzig
2020년 3월 9일
Language detection also works very much better on longer text. It is not trying to do a dictionary lookup (and several of your words are valid German, anyway), it uses statistical information of letter distribution.
Part of speech detection relies heavily on the context in a sentence.
documents = tokenizedDocument("My kid is a king");
documents = addPartOfSpeechDetails(documents);
tokenDetails(documents)
ans =
5×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
______ ______________ ______________ __________ _______ ________ ______________
"My" 1 1 1 letters en pronoun
"kid" 1 1 1 letters en noun
"is" 1 1 1 letters en auxiliary-verb
"a" 1 1 1 letters en determiner
"king" 1 1 1 letters en noun
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Text Data Preparation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!