Cosine Similarity using BERT

Question

0 개 추천

I am using BERT to calculate similarities in Question Answering. I have encoded my Question data using

data.Tokens = encode(mdl.Tokenizer,data.Questions) which returns me a cell array.

Next, I proceeded to encode new text to test the similiarity with the already encoded Questions in the database: testTokens = encode(mdl.Tokenizer,text)

However, I am imable to use the cosineSimilarity(data.Tokens,testTokens) and I receive an error that says:

Input must be a matrix, a tokenizedDocument array, a bagOfWords model, a bagOfNgrams model, a string array of words, or a cell array of character vectors.

Do I need padding here or reshape of my cell vectors?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Divyam Gupta 2021년 6월 30일

1 개 추천

Hi Nicholas, I notice that you're facing an issue while computing the cosine similarity using a text encoder. As per the documentation mentioned at https://www.mathworks.com/help/textanalytics/ref/cosinesimilarity.html#d123e8335 the cosineSimilarity function takes a matrix to compute the similarity between two documents.

Since the encoded vector sizes for each of the questions is different, constructing a matrix might be difficult. You can do a pairwise comparision between the data.Tokens and the testTokens to compute the similarities. This can be achieved by running a nested loop while simultaneously storing the similarity scores.

Hope this helps.

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Nicholas Ang 2021년 6월 30일

Thank you! This worked!

댓글을 달려면 로그인하십시오.

Cosine Similarity using BERT

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

Community Treasure Hunt

Cosine Similarity using BERT

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기