How to cluster similar strings?

조회 수: 5 (최근 30일)
Serbring . 2020년 1월 26일
댓글: Serbring . 2020년 1월 29일
Hi all,
I have long lists of strings which I have automatically collected with a brute web scraping routine. However, many strings are pretty similar and I would like to reduce the length of the list by showing only the really different names. Is there any way, cluster together the strings? Below, you will find a sample of the list.
Thank you so much.
Best regards.
{'microbiologia agraria' }
{'microbiologia forestale e ambientale' }
{'microbiologia generale' }
{'microbiologia agraria' }
{'microbiologia generale e ambientale' }
{'microbiologia del suolo e del sottosuolo' }
{'nutrition and health: the functional foods'}
{'microbiologia generale e ambientale' }
{'microbial biotechnologies in agroforestry' }
{'microbiologia generale ed ambientale' }
{'microbiologia agraria e forestale' }

답변 (1개)

Image Analyst
Image Analyst 2020년 1월 26일
  댓글 수: 1
Serbring 2020년 1월 29일
Thanks for your reply. I already knew those distances, but the real problem is how to deal with those number. I will try to be more specific, so that you will understand the basic idea of the algorithm I have developed.
Let's assume, I have three strings A, B and C. I computed the pair-wise distance between the strings (so:A - B, A-C, B-C), and then I summed the distance of one string with the other two (so A-B and A-C for A). Then, I don't have any idea on how to deal with those number. Any suggestion is appreciate.

댓글을 달려면 로그인하십시오.


Help CenterFile Exchange에서 Hamming에 대해 자세히 알아보기


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by