Treatment of special characters and ascii codes

조회 수: 30 (최근 30일)
Mark Brandon
Mark Brandon 2012년 7월 6일
Matlab has nice support for special characters but the usage is complicated and incomplete. I have been using matlab extensively for over 10 years now, and I also have introduced to many of my students as well. I see two major problems with the way that special characters are used: 1) The subset of available tex characters is much too small. In particular, I am trying to include a "per mil" or "per mille" symbol in a plot. This symbol is widely used. There are many long blogs about how to get this symbol into plots, but none has been successful. Is there a reason to have such a short number of supported characters?
2) One can use ASCII character codes to get special symbols, but the Matlab seems to interpret these codes differently than other programs on the same computer. The issue here is apparently due to the fact that with unicode, there are many pages of alternate ASCII equivalents. For instance, I am using a Mac OSX 10.7.4. I can look up the ASCII code for the per mil symbol in MS Word and I find that it is listed for New Times Roman (and many other fonts as well) as ASCII 228 (Unicode 2030). When I use this ASCII code in Matlab, I get the character ä. One my students, who is using Matlab with Windows, gets the per mil symbol with the same code.
There is nothing in the documentation or on the web that describes this situation (determined after several hours of searching). This important issue should be documented. Can you provide an explanation or point to some information?

답변 (1개)

Walter Roberson
Walter Roberson 2012년 7월 6일
ASCII is a 7 bit code, 0 to 127, so any code from 128 upwards is not ASCII.
The first major generalization of ASCII was a series of codes known as ISO 8859-1 through 8859-16. 8 bit tables that claim to be ASCII are usually referring to ISO 8859-1. Code-point 228 is ä in -1 thru -4, and -9 thru -10, and -13 thru -16, with the position being occupied by a variety of other characters such as Δ in the other ISO 8859 character sets.
I happen to have MS Word running on my OS-X system, and I do see that if I ask to insert symbol and switch character sets in the resulting drop-down menu to "(normal text)", that "per mille" occurs in a position numbered 228 by MS Word. As best I can tell, the only thing that 228 references is the position number in the Mac Times Roman font from before Apple switched to Unicode. The 228 is not used for per mille anywhere else that I could dig up (though apparently Windows Code Page 10000 is a reference to this encoding.)
What you should do is use the Unicode position numbers, and adjust the MS Windows machine to use an international standard. This might require setting it to use code page 1200 or 1201 (65001 would probably not be correct I think)
Unfortunately, MATLAB cannot handle code points from 65536 and upwards, only 16 bit code points.
The above discussion applies only to char() and to any HTML entity numbers you use, and does not apply to any TeX or LaTex coded strings. TeX and LaTex were designed before Unicode, and although there is an available LaTex library that adds Unicode code points to full LaTeX implementations, as best I can tell the MATLAB LaTeX implementation is missing enough operations to not be able to import the library. (And I worry that if you did manage to find a way to import it, you would have to import it in each string you wanted to use it with.)
  댓글 수: 2
Mark Brandon
Mark Brandon 2012년 7월 8일
편집: Mark Brandon 2012년 7월 8일
Walter, thanks for your quick answer. All of this brings back old memories. Further searching indicates to me that your summary is right on all counts, except for one significant issue. Matlab at present does not have an easy way to change codes or to use the unicode character scheme. This is a sad situation but there is some indication that future versions of Matlab may include this capability.
One other matter: the documentation in Matlab gives the impression that the Latex interpreter provides a full implementation of Latex. My experience is that Tex and Latex are restricted to the same limited set of special characters. The list of special characters is difficult to find. Click the link "TeX Character Sequence Table", which is located under the String heading for Text property at http://www.mathworks.com/help/techdoc/ref/text_props.html.
Walter Roberson
Walter Roberson 2012년 7월 8일
Adjusting the MS Window machine is a matter of setting its regionalization settings or something like that. I do not know if MATLAB on MS Windows supports the LANG environment variable; I have seen hints that it might, but I do not have access to a test system.
Simulink can support changing encodings, through Simulink specific variables (or is it calls... I do not recall now.) I believe that discussion was one that I tagged with "unicode"
permille and latex and add-on packages were discussed over here

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by