Import part of dataset in a HDF5 file, by 'member' and/or 'logical array'

조회 수: 19 (최근 30일)
한범
한범 2022년 6월 21일
답변: Walter Roberson 2022년 6월 22일
Dear friends,
I am trying to open quite big (~3G) hdf5 files in matlab and compute it parallely. But files are too big, and it takes so long time to load it and also the RAM is broken because the workspace became full, so I hope there is a way I can open just small part of the matrix.
For example, if I do h5disp('Data.h5'), I get:
Group '/'
Dataset 'data'
Size: 12000
MaxSize: Inf
Datatype: H5T_COMPOUND
Member 'A': H5T_STD_U32LE (uint32)
Member B': H5T_STD_U32LE (uint32)
Member 'C': H5T_STD_U64LE (uint64)
Member 'D': H5T_ARRAY
Size: 15
Base Type: H5T_STD_U16LE (uint16)
Member 'E': H5T_ARRAY
Size: 30000x15
Base Type: H5T_STD_U32LE (uint32)
ChunkSize: 1
Filters: deflate(1)
FillValue: H5T_COMPOUND
It seems with high-level function 'h5read()' I can import the data in the unit of chunks. However, each chunk contains all members - ABCDE. In this case E takes the most of the size and is the reason for the long importing time. Is there any method to only import A, B, or C without loading D E?
Moreover i have one more problem. I know that with 'h5read()' I can import just 'some' chunks in the file in the form of h5read(filename,ds,start,count,stride). However, it seems 'stride' can be only one interger. Can I import the portion of data defind by indexing array, such as [1,100,121,400,3254,...] or [1 0 0 1 0 1 0 ...]?
I tried to deal with it by myself and even looked into the low-level functions, but it is beyond my limit. It seems many friends here have already given such question in this community, but I found no satisfying answer for this problem. If anyone can help please answer me.

답변 (2개)

MJFcoNaN
MJFcoNaN 2022년 6월 21일
Hello,
The "start, count, stride" is suitable for slicing a huge matix. For example this will only read a "vector" from a 2D matrix thus much less RAM needed.
% fix 2nd dimension
data=h5read('yourfile','needed dataset',[1 1],[inf 1]);
% or fix 1st dim
data=h5read('yourfile','needed dataset',[1 1],[1 inf]);
then you can deal with it in matlab.
  댓글 수: 3
MJFcoNaN
MJFcoNaN 2022년 6월 22일
I may misunderstand...Don't you want to limit RAM consumption?
PS: "Can I import the portion of data defind by indexing array, such as [1,100,121,400,3254,...] or [1 0 0 1 0 1 0 ...]?" There is no direct way, but you can read data one by one in a loop by setting count equal 1 of course...
한범
한범 2022년 6월 22일
The datatype of the dataset is H5T_COMPOUND and there is only one dimension for it. You slicing method does not work.
What I want to do is to read and import just specific 'member' of this H5T_COMPOUND chunks.

댓글을 달려면 로그인하십시오.


Walter Roberson
Walter Roberson 2022년 6월 22일
The approach seems to be to use the H5T utilities to create a prototype containing only the members that you want to read, and then pass the prototype to the HDF read routine.
This is not convenient, but it does appear to be possible.

태그

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by