Using the html decimal codes is possible to write unicode characters to a text file, for example
writecell({['down ' char(8595) ' arrow']}, 'a_filename','Encoding','UTF-8')
create a text file containing the string 'down โ†“ arrow'.
I'm trying to do the same but with national flags instead of arrows.
For example, the html decimal codes I found for the italian flag ๐Ÿ‡ฎ๐Ÿ‡น are 58639, 127481 and 127470, but by plugging them in the previous command, the flag is not saved in the text file.
Is this because flags are not supported by matlab or because there are some errors in the code?

 ์ฑ„ํƒ๋œ ๋‹ต๋ณ€

Rik
Rik 2020๋…„ 8์›” 20์ผ

2 ๊ฐœ ์ถ”์ฒœ

Matlab stores characters internally in a uint16. That means only your first character is supported:
isvalidchar = double(uint16(inf)) > [58639, 127481, 127470]
% 1 0 0
As a workaround you can print the raw binary data. You can read the Wikipedia page for full details, but essentially you need to pick the line below that results in the fewest bytes. Replace the x with the binary of your character value.
%0xxxxxxx
%110xxxxx 10xxxxxx
%1110xxxx 10xxxxxx 10xxxxxx
%11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Here is a function you can use. Note that whatever file reader you use next will also have to support these newer constructs.
function b=char_to_UTF8_bin(c)
%equivalent to unicode2native(char(c),'UTF-8')
%for 55250:55295 and 57344:65535 (and 65536:2097151) the outputs don't match
c=double(c);
if ~isscalar(c)
b=arrayfun(@char_to_UTF8_bin,c,'UniformOutput',0);
b=horzcat(b{:});
return
end
if c<128 %0xxxxxxx
b=c;
elseif c<2048 %110xxxxx 10xxxxxx
b=zeros(1,2);
c=dec2bin(c,11);
b(1)=bin2dec(['110' c(1:5)]);
b(2)=bin2dec(['10' c(6:11)]);
elseif c<65536 %1110xxxx 10xxxxxx 10xxxxxx
b=zeros(1,3);
c=dec2bin(c,16);
b(1)=bin2dec(['1110' c(1:4)]);
b(2)=bin2dec(['10' c(5:10)]);
b(3)=bin2dec(['10' c(11:16)]);
elseif c<2097152 %11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
b=zeros(1,4);
c=dec2bin(c,21);
b(1)=bin2dec(['11110' c(1:3)]);
b(2)=bin2dec(['10' c(4:9)]);
b(3)=bin2dec(['10' c(10:15)]);
b(4)=bin2dec(['10' c(16:21)]);
else
error('not a valid UTF-8 character')
end
b=uint8(b);
end

๋Œ“๊ธ€ ์ˆ˜: 12

giannit
giannit 2020๋…„ 12์›” 15์ผ
Thank you Rik! I'm using your function with the decimal code 58639 for italian flag
flag = char_to_UTF8_bin('58639')
flag =
1ร—5 uint8 row vector
53 56 54 51 57
But if I run for example
clipboard('copy',['a' flag 'b'])
I don't get 'a๐Ÿ‡ฎ๐Ÿ‡นb' but 'a58639b', did I make a mistake?
Thank you
Rik
Rik 2020๋…„ 12์›” 15์ผ
You should enter the decimal value as a value, not a char array.
My remarks are also not quite correct: Matlab stores characters as UTF-16 encoded values. You will need the function below to convert UTF-16 to Unicode.
function unicode=UTF16_to_unicode(UTF16)
%Convert UTF-16 to the code points stored as uint32
%
%See https://en.wikipedia.org/wiki/UTF-16
%
% 1 word (U+0000 to U+D7FF and U+E000 to U+FFFF):
% xxxxxxxx_xxxxxxxx
% 2 words (U+10000 to U+10FFFF):
% 110110xx_xxxxxxxx 110111xx_xxxxxxxx
persistent isOctave,if isempty(isOctave),isOctave = exist('OCTAVE_VERSION', 'builtin') ~= 0;end
UTF16=uint32(UTF16);
multiword= UTF16>55295 & UTF16<57344; %0xD7FF and 0xE000
if ~any(multiword)
unicode=UTF16;return
end
word1= find( UTF16>=55296 & UTF16<=56319 );
word2= find( UTF16>=56320 & UTF16<=57343 );
try
d=word2-word1;
if any(d~=1)
error('trigger error')
end
catch
error('input is not valid UTF-16 encoded')
end
%Binary header:
% 110110xx_xxxxxxxx 110111xx_xxxxxxxx
% 00000000 01111111 11122222 22222333
% 12345678 90123456 78901234 56789012
header_bits='110110110111';header_locs=[1:6 17:22];
multiword=UTF16([word1.' word2.']);
multiword=unique(multiword,'rows');
S2=mat2cell(multiword,ones(size(multiword,1),1),2);
unicode=UTF16;
for n=1:numel(S2)
bin=dec2bin(double(S2{n}))';
if ~strcmp(header_bits,bin(header_locs))
error('input is not valid UTF-16 encoded')
end
bin(header_locs)='';
if ~isOctave
S3=uint32(bin2dec(bin ));
else
S3=uint32(bin2dec(bin.'));%Octave needs an extra transpose.
end
S3=S3+65536;% 0x10000
%Perform actual replacement.
unicode=PatternReplace(unicode,S2{n},S3);
end
end
function out=PatternReplace(in,pattern,rep)
%Functionally equivalent to strrep, but extended to more data types.
out=in(:)';
if numel(pattern)==0
L=false(size(in));
elseif numel(rep)>numel(pattern)
error('not implemented (padding required)')
else
L=true(size(in));
for n=1:numel(pattern)
k=find(in==pattern(n));
k=k-n+1;k(k<1)=[];
%Now k contains the indices of the beginning of each match.
L2=false(size(L));L2(k)=true;
L= L & L2;
if ~any(L),break,end
end
end
k=find(L);
if ~isempty(k)
for n=1:numel(rep)
out(k+n-1)=rep(n);
end
if numel(rep)==0,n=0;end
if numel(pattern)>n
k=k(:);%Enforce direction.
remove=(n+1):numel(pattern);
idx=bsxfun_plus(k,remove-1);
out(idx(:))=[];
end
end
end
function out=bsxfun_plus(in1,in2)
%Implicit expansion for plus(), but without any input validation.
try
out=in1+in2;
catch
try
out=bsxfun(@plus,in1,in2);
catch
sz1=size(in1); sz2=size(in2);
in1=repmat(in1,max(1,sz2./sz1)); in2=repmat(in2,max(1,sz1./sz2));
out=in1+in2;
end
end
end
giannit
giannit 2020๋…„ 12์›” 15์ผ
Thank you very much Rik for further help! I'm trying your new function UTF16_to_unicode. Sorry if I bother you again but I'm struggling to understand what to give it as input.
The unicodes for the italian flag are U+1F1EE and U+1F1F9 (for ๐Ÿ‡ฎ and ๐Ÿ‡น respectively), so what should I put as input? I tried with
UTF16_to_unicode('U+1F1EE_U+1F1F9')
UTF16_to_unicode({'U+1F1EE' 'U+1F1F9'})
but it does not work as expected. Thank you again
Rik
Rik 2020๋…„ 12์›” 16์ผ
UTF16_to_unicode will decode a char to actual Unicode code points. So once something is between single quotes you can use this to convert it to Unicode, after which you can use char_to_UTF8_bin to convert that to UTF-8 and write it as binary data to a file. File readers that support UTF-8 (and support the characters you put in at the start) will display the exact same in the file as what Matlab displays between the single quotes.
You can use hex2dec to convert the hexadecimal part to decimal:
v=hex2dec({'1F1EE','1F1F9'});
char_to_UTF8_bin(v)
giannit
giannit 2020๋…„ 12์›” 16์ผ
Thank you very much Rik, I really appreciate your big help.
Something is happening now, when running your code
v = hex2dec({'1F1EE','1F1F9'});
str = ['this ' char_to_UTF8_bin(v) ' is the italian flag']
then the output is
'this รฐยŸย‡ยฎรฐยŸย‡ยน is the italian flag'
which is not correct but we are on the right path I guess since some symbols have appeared, what do you think? Thank you very much
p.s. the correct output would be
'this ๐Ÿ‡ฎ๐Ÿ‡น is the italian flag'
Rik
Rik 2020๋…„ 12์›” 16์ผ
The char data type in Matlab is UTF-16, so you only need to convert to UTF-8 if you want to write to a file. If you want the flag to show up in a char, you will have to encode those Unicode code points in UTF-16.
function str=unicode_to_UTF16(unicode)
%Convert a single character to UTF-16 bytes.
%
%The value of the input is converted to binary and padded with 0 bits at the front of the string to
%fill all 'x' positions in the scheme.
%See https://en.wikipedia.org/wiki/UTF-16
%
% 1 word (U+0000 to U+D7FF and U+E000 to U+FFFF):
% xxxxxxxx_xxxxxxxx
% 2 words (U+10000 to U+10FFFF):
% 110110xx_xxxxxxxx 110111xx_xxxxxxxx
if unicode<65536
str=unicode;return
end
U=double(unicode)-65536;%Convert to double for ML6.5.
U=dec2bin(U,20);
str=bin2dec(['110110' U(1:10);'110111' U(11:20)]).';
end
giannit
giannit 2020๋…„ 12์›” 16์ผ
Rik thank you for the fast support! I'm running this code using your new function
v = hex2dec({'1F1EE','1F1F9'});
str = ['this ' unicode_to_UTF16(v) ' is the italian flag'];
clipboard('copy',str)
but what gets copied in my clipboard is
this ๐ฟฐ is the italian flag
that is the ๐Ÿ‡ฎ๐Ÿ‡น symbol is not correctly displayed, am I doing something wrong?
Thank you very much
Rik
Rik 2020๋…„ 12์›” 17์ผ
Yes: the documentation of the unicode_to_UTF16 clearly states you need to insert a single character at a time.
v = hex2dec({'1F1EE','1F1F9'});
b=arrayfun(@unicode_to_UTF16,v,'UniformOutput',0);b=horzcat(b{:});
str = ['this ' b ' is the italian flag'];
disp(str)
this ๐Ÿ‡ฎ๐Ÿ‡น is the italian flag
The functions below are all part of the readfile function. The normal version can be found on the FEX.
% The functions below were minified to make them more compact.
%unicode_to_UTF16 - converts a single Unicode code point to UTF-16
%unicode_to_UTF8 - converts a single Unicode code point to UTF-8
%UTF16_to_unicode - converts a Matlab char array to Unicode code points
function v000=unicode_to_UTF16(v001),...
if v001<65536,v000=v001;return,end,v002=double(v001)-65536;v002=dec2bin(v002,20);v000=bin2dec(['110110' v002(1:10);'110111' v002(11:20)]).';end
function v000=unicode_to_UTF8(v001),...
if v001<128,v000=v001;return,end,persistent v002,if isempty(v002),v002=struct;v002.limits.lower=hex2dec({'0000','0080','0800',...
'10000'});v002.limits.upper=hex2dec({'007F','07FF','FFFF','10FFFF'});v002.scheme{2}='110xxxxx10xxxxxx';v002.scheme{2}=reshape(v002.scheme{2}.',8,2);
v002.scheme{3}='1110xxxx10xxxxxx10xxxxxx';v002.scheme{3}=reshape(v002.scheme{3}.',8,3);v002.scheme{4}='11110xxx10xxxxxx10xxxxxx10xxxxxx';
v002.scheme{4}=reshape(v002.scheme{4}.',8,4);for v003=2:4,v002.scheme_pos{v003}=find(v002.scheme{v003}=='x');
v002.bits(v003)=numel(v002.scheme_pos{v003});end,end,v004=find(v002.limits.lower<v001 & v001<v002.limits.upper);
v000=v002.scheme{v004};v005=v002.scheme_pos{v004};v003=dec2bin(v001,v002.bits(v004));v000(v005)=v003;v000=bin2dec(v000.').';end
function v000=UTF16_to_unicode(v001),persistent v002,if isempty(v002),v002 = exist('OCTAVE_VERSION', 'builtin') ~= 0;
end,v001=uint32(v001);v003= v001>55295 & v001<57344;if ~any(v003),v000=v001;return,end,v004= find( v001>=55296 & v001<=56319 );v005= ...
find( v001>=56320 & v001<=57343 );try v006=v005-v004;if any(v006~=1),error('trigger error'),end,catch,error('input is not valid UTF-16 encoded'),...
end,v007='110110110111';v008=[1:6 17:22];v003=v001([v004.' v005.']);v003=unique(v003,'rows');v009=mat2cell(v003,ones(size(v003,1),1),2);v000=v001;
for v010=1:numel(v009),v011=dec2bin(double(v009{v010}))';if ~strcmp(v007,v011(v008)),error('input is not valid UTF-16 encoded'),end,v011(v008)='';
if ~v002,v012=uint32(bin2dec(v011 ));else,v012=uint32(bin2dec(v011.'));end,v012=v012+65536;v000=PatternReplace(v000,v009{v010},v012);end,end
function v000=PatternReplace(v001,...
v002,v003),v000=v001(:)';if numel(v002)==0,v004=false(size(v001));elseif numel(v003)>numel(v002),error('not implemented (padding required)'),else,...
v004=true(size(v001));for v005=1:numel(v002),v006=find(v001==v002(v005));v006=v006-v005+1;v006(v006<1)=[];v007=false(size(v004));v007(v006)=true;
v004= v004 & v007;if ~any(v004),break,end,end,end,v006=find(v004);if ~isempty(v006),for v005=1:numel(v003),v000(v006+v005-1)=v003(v005);end,...
if numel(v003)==0,v005=0;end,if numel(v002)>v005,v006=v006(:);v008=(v005+1):numel(v002);v009=bsxfun_plus(v006,v008-1);v000(v009(:))=[];end,end,end
function v000=bsxfun_plus(v001,v002),try v000=v001+v002;catch,try v000=bsxfun(@plus,...
v001,v002);catch,v003=size(v001); v004=size(v002);v001=repmat(v001,max(1,v004./v003)); v002=repmat(v002,max(1,v003./v004));v000=v001+v002;end,end,end
giannit
giannit 2020๋…„ 12์›” 17์ผ
Thank you very much Rik it finally works!!
You have been very kind to help thank you and merry christmas!
giannit
giannit 2020๋…„ 12์›” 17์ผ
p.s. did you know that instead of
horzcat(b{:})
you can use a shorter notation
[b{:}]
:)
Rik
Rik 2020๋…„ 12์›” 17์ผ
You're welcome.
And yes, I was aware, I just prefer to make it explicit with a direct call to either horzcat or vertcat. Not that it will actually matter, but that is wat [] is calling under the hood, so there may be an imperceptible speed increase.
giannit
giannit 2020๋…„ 12์›” 17์ผ
Thank you Rik for explanation, you really have been very kind

๋Œ“๊ธ€์„ ๋‹ฌ๋ ค๋ฉด ๋กœ๊ทธ์ธํ•˜์‹ญ์‹œ์˜ค.

์ถ”๊ฐ€ ๋‹ต๋ณ€ (0๊ฐœ)

์นดํ…Œ๊ณ ๋ฆฌ

๋„์›€๋ง ์„ผํ„ฐ ๋ฐ File Exchange์—์„œ Cell Arrays์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ธฐ

์ œํ’ˆ

๋ฆด๋ฆฌ์Šค

R2019b

ํƒœ๊ทธ

์งˆ๋ฌธ:

2020๋…„ 5์›” 28์ผ

๋Œ“๊ธ€:

2020๋…„ 12์›” 17์ผ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by