Treatment of special characters and ascii codes
조회 수: 5(최근 30일)
Matlab has nice support for special characters but the usage is complicated and incomplete. I have been using matlab extensively for over 10 years now, and I also have introduced to many of my students as well. I see two major problems with the way that special characters are used: 1) The subset of available tex characters is much too small. In particular, I am trying to include a "per mil" or "per mille" symbol in a plot. This symbol is widely used. There are many long blogs about how to get this symbol into plots, but none has been successful. Is there a reason to have such a short number of supported characters?
2) One can use ASCII character codes to get special symbols, but the Matlab seems to interpret these codes differently than other programs on the same computer. The issue here is apparently due to the fact that with unicode, there are many pages of alternate ASCII equivalents. For instance, I am using a Mac OSX 10.7.4. I can look up the ASCII code for the per mil symbol in MS Word and I find that it is listed for New Times Roman (and many other fonts as well) as ASCII 228 (Unicode 2030). When I use this ASCII code in Matlab, I get the character ä. One my students, who is using Matlab with Windows, gets the per mil symbol with the same code.
There is nothing in the documentation or on the web that describes this situation (determined after several hours of searching). This important issue should be documented. Can you provide an explanation or point to some information?
Walter Roberson 2012년 7월 6일
ASCII is a 7 bit code, 0 to 127, so any code from 128 upwards is not ASCII.
The first major generalization of ASCII was a series of codes known as ISO 8859-1 through 8859-16. 8 bit tables that claim to be ASCII are usually referring to ISO 8859-1. Code-point 228 is ä in -1 thru -4, and -9 thru -10, and -13 thru -16, with the position being occupied by a variety of other characters such as Δ in the other ISO 8859 character sets.
I happen to have MS Word running on my OS-X system, and I do see that if I ask to insert symbol and switch character sets in the resulting drop-down menu to "(normal text)", that "per mille" occurs in a position numbered 228 by MS Word. As best I can tell, the only thing that 228 references is the position number in the Mac Times Roman font from before Apple switched to Unicode. The 228 is not used for per mille anywhere else that I could dig up (though apparently Windows Code Page 10000 is a reference to this encoding.)
What you should do is use the Unicode position numbers, and adjust the MS Windows machine to use an international standard. This might require setting it to use code page 1200 or 1201 (65001 would probably not be correct I think)
Unfortunately, MATLAB cannot handle code points from 65536 and upwards, only 16 bit code points.
The above discussion applies only to char() and to any HTML entity numbers you use, and does not apply to any TeX or LaTex coded strings. TeX and LaTex were designed before Unicode, and although there is an available LaTex library that adds Unicode code points to full LaTeX implementations, as best I can tell the MATLAB LaTeX implementation is missing enough operations to not be able to import the library. (And I worry that if you did manage to find a way to import it, you would have to import it in each string you wanted to use it with.)