Problme with Text analysis

Question

0 개 추천

Hi, I try to clean a table containing both latin and non-latin strings to plot a wordcloud. I used regexprep function but not successfully. I can't remove korean strings. Any idea? Here an example of the code and the output:

pathName = 'Keyword Aug. 2020 to Oct. 2021_MatlabSmall.xlsx';
T = readtable(pathName,'Range','A:B');
% Convert all Character Vector to Lowercase
T.Keyword = lower(T.Keyword);
% Remove not useful keywords
T(strcmp(T.Keyword, '(not provided)'), :)=[];
T(strcmp(T.Keyword, '(not set)'), :)=[];
% Set lower case
T.Keyword = lower(T.Keyword);
% Remove links
T(contains(T.Keyword, 'http'), :)=[];
T(contains(T.Keyword, '.'), :)=[];
T.Keyword = strrep(T.Keyword, ' ', '_');
display(head(T));
% Replace non alphanumerics
T.Keyword = regexprep(T.Keyword,'^a-z','');
 
8×2 table
                 Keyword                 Sessions
    _________________________________    ________
    'stuff'                                390   
    'forum'                                128   
    'student'                               76   
    '재료'                                  59   
    'stuff'                                 56   
    'uninstall_stuff_license_manager'       52   
    'stuff_resource_center'                 43   
    'stuff_student_community'               34   

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

DGM 2021년 10월 19일

MATLAB Online에서 열기

0 개 추천

I'm terrible with regex, but this might get you somewhere. Replaces everything but lowercase alpha and underscores.

A = {'9.banana' 'orange-123_juice' 'ン戦国時' 'apple_sauce' 'abcクルミ' 'peach' 'pear' 'ピラミッド' 'cherry'}.'
A = 9×1 cell array
    {'9.banana'        }
    {'orange-123_juice'}
    {'ン戦国時'         }
    {'apple_sauce'     }
    {'abcクルミ'        }
    {'peach'           }
    {'pear'            }
    {'ピラミッド'       }
    {'cherry'          }
B = regexprep(A,'[^a-z_]','')
B = 9×1 cell array
    {'banana'      }
    {'orange_juice'}
    {0×0 char      }
    {'apple_sauce' }
    {'abc'         }
    {'peach'       }
    {'pear'        }
    {0×0 char      }
    {'cherry'      }

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Problme with Text analysis

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Problme with Text analysis

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기