textrankScores

TextRank 알고리즘을 사용하여 문서 점수화

구문

scores = textrankScores(documents)

scores = textrankScores(bag)

설명

scores = textrankScores(documents)는 TextRank 알고리즘을 사용한 쌍별 유사도 값에 따라 documents의 중요도를 점수화합니다. 유사도와 중요도 점수를 계산하기 위해 이 함수는 각각 BM25와 PageRank 알고리즘을 사용합니다.

예제

scores = textrankScores(bag)은 bag-of-words 또는 bag-of-n-grams 모델 bag을 사용하여 인코딩된 문서를 점수화합니다.

예제

모두 축소

문서의 중요도

라이브 스크립트 열기

토큰화된 문서로 구성된 배열을 만듭니다.

str = [
    "the quick brown fox jumped over the lazy dog"
    "the fast brown fox jumped over the lazy dog"
    "the lazy dog sat there and did nothing"
    "the other animals sat there watching"];
documents = tokenizedDocument(str)

documents = 
  4×1 tokenizedDocument:

    9 tokens: the quick brown fox jumped over the lazy dog
    9 tokens: the fast brown fox jumped over the lazy dog
    8 tokens: the lazy dog sat there and did nothing
    6 tokens: the other animals sat there watching

TextRank 점수를 계산합니다.

scores = textrankScores(documents);

점수를 막대 차트로 시각화합니다.

figure
bar(scores)
xlabel("Document")
ylabel("Score")
title("TextRank Scores")

Figure contains an axes object. The axes object with title TextRank Scores, xlabel Document, ylabel Score contains an object of type bar.

Bag-of-Words 모델을 사용한 점수

라이브 스크립트 열기

sonnets.csv의 텍스트 데이터에서 bag-of-words 모델을 만듭니다.

filename = "sonnets.csv";
tbl = readtable(filename,'TextType','string');
textData = tbl.Sonnet;
documents = tokenizedDocument(textData);
bag = bagOfWords(documents)

bag = 
  bagOfWords with properties:

        NumWords: 3527
          Counts: [154×3527 double]
      Vocabulary: ["From"    "fairest"    "creatures"    "we"    "desire"    "increase"    ","    "That"    "thereby"    "beauty's"    "rose"    "might"    "never"    "die"    "But"    "as"    "the"    "riper"    "should"    "by"    …    ] (1×3527 string)
    NumDocuments: 154

TextRank 점수를 계산합니다.

scores = textrankScores(bag);

점수를 막대 차트로 시각화합니다.

figure
bar(scores)
xlabel("Document")
ylabel("Score")
title("TextRank Scores")

Figure contains an axes object. The axes object with title TextRank Scores, xlabel Document, ylabel Score contains an object of type bar.

입력 인수

모두 축소

`documents` — 입력 문서
`tokenizedDocument` 배열 | string형 배열 | 문자형 벡터로 구성된 셀형 배열

입력 문서로, tokenizedDocument 배열, 단어로 구성된 string형 배열 또는 문자형 벡터로 구성된 셀형 배열로 지정됩니다. documents가 tokenizedDocument 배열이 아닌 경우 이는 단일 문서를 나타내고 각 요소가 단어인 행 벡터여야 합니다. 문서를 여러 개 지정하려면 tokenizedDocument 배열을 사용하십시오.

`bag` — 입력 모델
`bagOfWords` 객체 | `bagOfNgrams` 객체

입력 bag-of-words 모델 또는 bag-of-n-grams 모델로, bagOfWords 객체 또는 bagOfNgrams 객체로 지정됩니다. bag이 bagOfNgrams 객체일 경우 이 함수는 각 n-gram을 단일 단어로 처리합니다.

출력 인수

모두 축소

`scores` — TextRank 점수
벡터

TextRank 점수로, Nx1 벡터로 반환됩니다. 여기서 scores(i)는 i번째 입력 문서의 점수에 해당하고 N은 입력 문서의 수입니다.

참고 문헌

[1] Mihalcea, Rada, and Paul Tarau. "TextRank: Bringing Order into Text." In Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404-411. 2004.

버전 내역

R2020a에 개발됨

참고 항목

도움말 항목

Sequence-to-Sequence Translation Using Attention

textrankScores

구문

설명

예제

문서의 중요도

Bag-of-Words 모델을 사용한 점수

입력 인수

documents — 입력 문서 tokenizedDocument 배열 | string형 배열 | 문자형 벡터로 구성된 셀형 배열

bag — 입력 모델 bagOfWords 객체 | bagOfNgrams 객체

출력 인수

scores — TextRank 점수 벡터

참고 문헌

버전 내역

참고 항목

도움말 항목

`documents` — 입력 문서
`tokenizedDocument` 배열 | string형 배열 | 문자형 벡터로 구성된 셀형 배열

`bag` — 입력 모델
`bagOfWords` 객체 | `bagOfNgrams` 객체

`scores` — TextRank 점수
벡터