Creating character codes from text file

조회 수: 10 (최근 30일)
Mohammad
Mohammad 2015년 10월 23일
댓글: dpb 2015년 10월 24일
Suppose that I have a text file contains several lines such as the following
My name is John, I studied Chemistry at Okawata University. I am a Canadian citizen but I lived in 123
I want to do the following:
  1. I want to remove all punctuation marks fro the text.
  2. I want to convert the remaining character including the spaces and the numbers into predefined codes such as: (A: c01, B: c02, ... , a:c27. b:c28) ans also the numbers such as (0:c60, 1:c60, 2:c61) and store the results into another output text file which contains only the codes and each character in a line as follows (suppose the code of the characters of the string "My name": M:c07, y:c30, space:c22, n:c35, ...) so the output file should contain:
c07
c30
c22
c35
..

채택된 답변

Thorsten
Thorsten 2015년 10월 23일
편집: Thorsten 2015년 10월 23일
Set up code table X
chars = ['A':'Z' 'a':'z'];
for i = 1:numel(chars)
X{chars(i)} = sprintf('c%02d', i);
end
digits = '0':'9'
for i = 1:numel(digits)
X{digits(i)} = sprintf('c%02d', 59+i);
end
X{' '} = 'c22';
Encode string
s = 'My name is John, I studied Chemistry at Okawata University.';
sc = strvcat(X{'My name is John.'})
Note that you don't have to remove the punctuation marks, they are mapped to empty and removed my strvcat.
  댓글 수: 3
Mohammad
Mohammad 2015년 10월 23일
Thanks, but how can apply this dolution to Arabic character?
dpb
dpb 2015년 10월 24일
Every character set has its translation...

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2015년 10월 23일
  1. is pretty straightforward to replace characters with empty, thereby removing them entirely. regexp is one way
  2. build a "lookup table" of the code desired for each character stored in its ASCII collating sequence. IOW, using your example above for a small subset, since
>> s='My name';
>> double(s)
ans =
77 121 32 110 97 109 101
>>
store the codes for each of those in those array elements.
Then, since Matlab will do an automagic conversion from character to numeric, you can do the conversion on any converted string simply by
codedstr=codearray(cleanedInputString).';
and you're done...
I'll leave the intimate details as "exercise for student"... :)

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by