- I download interactively with Firefox without being asked about text encoding.
- I copy&pasted the file name with Windows Explorer, e.g. right-click, rename, copy.
- Notepad++ says that both the pasted file name string and the file content, "test", are ANSI-encoded
- Matlab double says that "é" of the file name is character 233
- Here (Sweden) "é" is an "extended ascii" character(?)
Working with unicode paths
조회 수: 4 (최근 30일)
이전 댓글 표시
The following is a followup to:
This question however is a bit more specific. I have a file which was created using a program on Windows. I can browse to the file in Windows Explorer (Win 7). I am however unable to:
- Open the file in Matlab (using fopen)
- If I create a directory with the same name, I am unable to cd to the directory. cd(directory)
I have uploaded the file to a public folder on my dropbox account. https://www.dropbox.com/sh/d2mghr9xyb426lz/ZEM4DH8XTp
The files are: v. Békésy - 1957.txt v. Békésy - 1957.zip
I am currently unable to provide instructions as to how one would create such a file in Matlab (hence providing them for download). For handling naming, I have also included the file in a zip, so that even if the zip is renamed on download, the file inside should maintain the same name. Incidentally, it was by exporting the zip to a folder with the same name that created the folder which I cannot cd to with Matlab.
Thus, the question is how do I get around issues #1 and #2 (without renaming them using manually using a windows interface). I am assuming this might mean using a custom library (mex and/or Java code).
The ideal solution is to provide a generic class of code that actually works for path/file manipulation instead of needing manual interference any time this problem is encountered.
Thanks, Jim
댓글 수: 4
per isakson
2013년 9월 4일
편집: per isakson
2013년 9월 4일
Now, I'm on a different computer (same installation: R2013a 64bit on Windows 7). I read your file without problems here too. "Standard Swedish" installation, I guess. And:
a = get(0, 'Language')
import java.nio.charset.Charset
b = Charset.defaultCharset()
c = feature('DefaultCharacterSet')
returns
a =
sv_se
b =
windows-1252
c =
windows-1252
채택된 답변
추가 답변 (2개)
Malcolm Lidierth
2013년 9월 6일
편집: Malcolm Lidierth
2013년 9월 6일
Jim
MATLAB/Java need to talk to an OS and a file system beneath so this is likely to vary across FAT12/16/32, NTFS etc as well as OS or MATLAB/Java.
From @Pers comments: the windows-1252 charset is proprietary, not unicode, and to convert a Java String to the originating byte[] requires the CharSet.
So, telling the difference between "'v. Békésy" on this screen to the byte[] that it was created from requires information that the string does not contain and, AFAIK, neither does the directory entry of any file system.
On my Mac:
>> java.nio.charset.Charset.availableCharsets.size()
ans =
166
The answer then is that there is no answer beyond "don't use special characters in file names" as suggested by Jan on your first post. But, on the assumption that nobody is likely to have used anything but 8 bit encoding:
>> java.lang.String('v. Békésy').getBytes()
ans =
118
46
32
66
-23
107
-23
115
121
But compare that with the MATLAB char array:
>> char(java.lang.String('v. Békésy').getBytes())
ans =
v .
B
k
s y
and with:
>> uint8(java.lang.String('v. Békésy').getBytes())
ans =
118
46
32
66
0
107
0
115
121
For this problem, MATLAB's uint arithmetic rules may not be the most useful.
>> java.lang.String(java.lang.String('v. Békésy').getBytes(),java.nio.charset.Charset.defaultCharset())
ans =
v. Békésy
>> java.nio.charset.Charset.defaultCharset()
ans =
ISO-8859-1
but:
>> java.lang.String(java.lang.String('v. Békésy').getBytes(), 'US-ASCII')
ans =
v. Bks
Regards ML
댓글 수: 5
Malcolm Lidierth
2013년 9월 6일
@Jim Not a Java issue. On my Mac e.g. with both a Mac HD and a FAT32 drive
>> java.io.File('v. Békésy')
ans =
v. Békésy
>> ans.exists()
ans =
1
Its a MATLAB issue.
P.S. SciLab works fine while R gives the name as v. Be\314\201ke\314\201sy but works happily with that.
per isakson
2013년 9월 12일
편집: per isakson
2013년 9월 13일
Googling taught me
- NTFS stores file names in Unicode.
- Not all zip-tools are Unicode-aware.
- The name of files transferred in zip-files between systems with different default character sets may be "corrupted".
Links
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!