Removing commas between columns in text data

Question

Kim Maria Damiani 2021년 10월 16일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1564986-removing-commas-between-columns-in-text-data

댓글: Kim Maria Damiani 2021년 10월 16일

I have a txt file which is the ouput of a lemmatizer, in the form

Sometimes, ,, I, use, commas, .
I, like, writing, ,, I, like, reading

How can I read it into a tokenizedDocument deleting the unneccessary commas between tokens? A simple approach would be

test=readlines('/path/to/file.txt')
test=strrep(test,',','')
test=tokenizedDocument(test)

but it would remove even the commas already present in the original text, while I'd like to preserve punctuation-

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2021년 10월 16일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1564986-removing-commas-between-columns-in-text-data#answer_809766

MATLAB Online에서 열기

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, {'(?<=[^,]),\s', '\s*,,', '\s+\.'}, {' ', ',', '.'})
test = 2×1 cell array
    {'Sometimes, I use commas.'      }
    {'I like writing, I like reading'}

Notice we had to have a special rule for periods. You have 'use, commas' which should almost certainly translate to 'use commas' (so comma space becomes space), but after that 'commas, .' should not become 'commas .' .

To put it another way, we cannot use the rule that comma space pair is to be deleted: that works for the comma space between the word 'commas' and the period, but it does not work for the comma space pair between 'use' and 'commas': if you tried to apply that rule then 'use, commas' would merge together to 'usecommas' .

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Kim Maria Damiani 2021년 10월 16일

Thank you!

댓글을 달려면 로그인하십시오.

Answer 2

Chunru 2021년 10월 16일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1564986-removing-commas-between-columns-in-text-data#answer_809751

MATLAB Online에서 열기

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, ',\s', ' ')
test = 2×1 cell array
    {'Sometimes , I use commas .'     }
    {'I like writing , I like reading'}

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Removing commas between columns in text data

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Removing commas between columns in text data

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기