Main Content

Working with Non-ASCII Characters in HDF5 Files

To enable sharing of HDF5 files across multiple locales, MATLAB® supports the use of non-ASCII characters in HDF5 files. This example shows you how to:

  • Create HDF5 files containing dataset and attribute names that have non-ASCII characters using the high-level functions.

  • Create variable-length string datasets containing non-ASCII characters using the low-level functions.

Create Dataset and Attribute Names Containing Non-ASCII Characters

Create an HDF5 file containing a dataset name and an attribute name that contains non-ASCII characters. To check if the dataset and attribute names appear as expected, write data to the dataset, and display the file information.

Create a dataset with a name (/数据集) that includes non-ASCII characters.

dsetName = ['/' char([25968 25454 38598])];
dsetDims = [5 2];
h5create('outfile.h5',['/grp1' dsetName],dsetDims,...
                                'TextEncoding','UTF-8');
Write data to the file.
dataToWrite = rand(dsetDims);
h5write('outfile.h5',['/grp1' dsetName],dataToWrite);

Create an attribute name (屬性名稱) that includes non-ASCII characters and assign a value to the attribute.

attrName = char([25967 25453 38597]);
h5writeatt('outfile.h5','/',attrName,'I am an attribute',...
                                      'TextEncoding','UTF-8');

Display information about the file and check if the attribute name and dataset name appear correctly.

h5disp('outfile.h5')
HDF5 outfile.h5 
Group '/' 
    Attributes:
        '/屬性名稱':  'I am an attribute'
    Group '/grp1' 
        Dataset '数据集' 
            Size:  5x2
            MaxSize:  5x2
            Datatype:   H5T_IEEE_F64LE (double)
            ChunkSize:  []
            Filters:  none
            FillValue:  0.000000

Create Variable-Length String Data Containing Non-ASCII Characters

Create a variable-length string dataset to store data containing non-ASCII characters using the low-level functions. Write the data to the dataset. Check if the data is written correctly.

Create data containing non-ASCII characters.

dataToWrite = {char([12487 12540 12479]) 'hello' ...
                   char([1605 1585 1581 1576 1575]); ...
               'world' char([1052 1080 1088])    ...
                   char([954 972 963 956 959 962])};
disp(dataToWrite)
    'データ'    'hello'    'مرحبا' 
    'world'    'Мир'      'κόσμος'

To write this data into a file, create an HDF5 file, define a group name, and a dataset name within the group.

Create the HDF5 file.

fileName = 'outfile.h5';
fileID = H5F.create(fileName,'H5F_ACC_TRUNC',...
                     'H5P_DEFAULT', 'H5P_DEFAULT');

To create the group containing non-ASCII characters in its name, first, configure the link creation property.

lcplID = H5P.create('H5P_LINK_CREATE'); 
H5P.set_char_encoding(lcplID,H5ML.get_constant_value('H5T_CSET_UTF8'));
plist = 'H5P_DEFAULT';

Then, create the group (グループ).

grpName = char([12464 12523 12540 12503]);
grpID = H5G.create(fileID,grpName,lcplID,plist,plist);

Create a dataset that contains variable-length string data with non-ASCII characters. First, configure its data type.

typeID = H5T.copy('H5T_C_S1');
H5T.set_size(typeID,'H5T_VARIABLE');
H5T.set_cset(typeID,H5ML.get_constant_value('H5T_CSET_UTF8'));

Now create the dataset by specifying its name, data type, and dimensions.

dsetName = 'datasetUtf8';
dataDims = [2 3];
h5DataDims = fliplr(dataDims);
h5MaxDims = h5DataDims;
spaceID = H5S.create_simple(2,h5DataDims,h5MaxDims);
dsetID = H5D.create(grpID,dsetName,typeID,spaceID,...
             'H5P_DEFAULT','H5P_DEFAULT','H5P_DEFAULT');

Write the data to the dataset.

H5D.write(dsetID,'H5ML_DEFAULT','H5S_ALL',...
               'H5S_ALL','H5P_DEFAULT',dataToWrite);

Read the data back.

dataRead = h5read('outfile.h5',['/' grpName '/' dsetName])
dataRead =

  2×3 cell array

    {'データ'}    {'hello'}    {'مرحبا' }
    {'world'}    {'Мир'  }    {'κόσμος'}

Check if data in the file matches the written data.

isequal(dataRead,dataToWrite)
ans =

  logical

   1

Close ids.

H5D.close(dsetID);
H5S.close(spaceID);
H5T.close(typeID);
H5G.close(grpID);
H5P.close(lcplID);
H5F.close(fileID);

See Also

| | | | | | | |