using fwrite for multiple data type at once

조회 수: 59 (최근 30일)
Eric Alexander
Eric Alexander 2021년 1월 22일
댓글: Walter Roberson 2021년 1월 26일
I am trying to write a vector of data of various data types (all numeric) to a .binary file all at once instead of having to loop through the data set. I have a vector that contains the numbers:
myData = [1 3.5 400 75 2.5 6 1];
and then each number has an associated data type:
myDataType = {'single','double','single','single','double','single','single'};
Is there a way to use the fwrite function to write all the data of varying types at once? Such as:
fwrite(fid,myData,myDataType)
This command doesnt work as fwrite. know that for this example that i am giving it would be just as fast to loop through the data and write them to the file one at a time using:
for n = 1:length(myData)
fwrite(fid,nyData(n),myDataType{n})
end
however the data vector i am writing contains thousands of different data and the format of the data type does not follow any particular set pattern and changes. typically, when all the data is the same such as a vector containing 10,000 double values, it is expoenntially faster to write all of the data at once like this:
fwrite(fid,myData,'double')
than it is to use a for loop. But for my specific case, the data types change. Any help would be greatly appreciated.

채택된 답변

Walter Roberson
Walter Roberson 2021년 1월 22일
memmapfile() can be used to read and write fixed-length records https://www.mathworks.com/help/matlab/ref/memmapfile.html
Alternately:
myData = [1 3.5 400 75 2.5 6 1];
myDataType = {'single','double','single','single','double','single','single'};
temp = arrayfun(@(val, type) typecast(cast(val, type{1}),'uint8'), myData, myDataType, 'uniform', 0);
bytes = horzcat(temp{:})
bytes = 1×36
0 0 128 63 0 0 0 0 0 0 12 64 0 0 200 67 0 0 150 66 0 0 0 0 0 0 4 64 0 0
Now bytes is a stream of uint8 that can be written.
On x64 architectures, the entries will be in little-endian order. You might potentially want
temp = arrayfun(@(val, type) typecast(swapbytes(cast(val, type{1})),'uint8'), myData, myDataType, 'uniform', 0);
bytes = horzcat(temp{:})
bytes = 1×36
63 128 0 0 64 12 0 0 0 0 0 0 67 200 0 0 66 150 0 0 64 4 0 0 0 0 0 0 64 192
  댓글 수: 4
Walter Roberson
Walter Roberson 2021년 1월 26일
fwrite(fileID,bytes,'uint8')
Yes. And because I constructed bytes to be uint8 you can abbreviate that to
fwrite(fileID, bytes);
Walter Roberson
Walter Roberson 2021년 1월 26일
if instead i would have to read the binary file like this: values = fread(fileId,7,'uint8');
That is a valid step, but it is only one step. You then have to divide the byte array into pieces and typecast to the original data type, such as
val1 = typecast(bytes(1:4), 'single');
val2 = typecast(bytes(5:12), 'double');
val3 = typecast(bytes(13:16), 'single');
You can create a function that takes a list of types and does the intepretation:
function [datacell, nextpos] = freadstream(bytes, nextpos, typelist)
knowntypes = {'single', 'double', 'uint8', 'int8', 'uint16', 'int16', 'uint32', 'int32', 'uint64', 'int64'};
typebytes = [4, 8, 1, 1, 2, 2, 4, 4, 8, 8];
nitems = length(typelist);
datacell = cell(1, nitems);
buffsize = length(bytes);
for K = 1 : nitems
thistype = typelist{K};
[isknown, typeidx] = ismember(thistype, knowntypes);
assert(isknown, 'freadstream: unknown item type "%s"', thistype);
thisbytes = typebtyes(typeidx);
lastpos = nextpos+thisbytes-1;
if lastpos > buffsize
error('ran out of buffer reading entry #%d', K);
end
thisvalue = typecast(bytes(nextpos:lastpos), thistype);
nextpos = lastpos+1;
datacell{K} = thisvalue;
end
end
Return would be a cell array of scalar values, and nextpos would be updated to point to the next available position. When first reading from the stream, use nextpos value of 1.
This has obvious extensions to support vectors or arrays, either by making typelist entries possibly being pairs of values including a size, or by encoding the size in characters like '5*double' or '[17,11]*double' . As usual "to end of buffer" could be inf. You could jiggle the behavior at end of stream, like 'eos', 'error' (like it says) vs 'eos', 'whole' (stops processing and returns what it can if stream does not have enough data to satisify an entire request) vs 'eos', 'split' (when an entire request cannot be satisfied, returns as much as is available from it, using the underlying type requested) -- which might require returning a vector instead of reshape according to requested size).

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2021년 1월 22일
The precision, skip, machinefmt optional inputs to fwrite have not been vectorized (as you have discovered), so you can't intermix the types with array syntax for the output array.
If you write a file in this fashion, however, not only do you have a problem writing it efficiently, you have also created the problem in being able to read it back efficiently--you'll have to read it record-by-record, too, because fread has the same limitation on what is a vector input.
I would strongly suggest to NOT do this at all: I see some possible alternatives:
  1. Just write everything as double; if you for some reason your really, really need a given variable to be single, then cast it after reading;
  2. Reorder the writing to put all of a given type in one call and the rest in another. This also adds more complication that I can see it possibly be being worth in have to sort things out in both directions again,
  3. SAVE the variables to a .mat file instead. Preserves type at the expense of some overhead but much less painful to code and probably at least as fast as the looping solution to do what is requested.
  댓글 수: 3
Eric Alexander
Eric Alexander 2021년 1월 24일
Thank you for the tips. This is a legacy system and I have no control over how the system I'm sending the binary file to operates so changing the format of the binary or the binary to a .mat file is not an option that I have. I've considered writing data sets sequentially based on similar data types but again, the processing time I save doing that is lost in parsing the data into chunks to begin with.
dpb
dpb 2021년 1월 24일
Shoulda' known/guessed.
The memmapfile idea should work; I don't know about performance; the one time I tried using it was on Q? here for large file input and it was much slower (by about 10X) than directly computing the desired location in the file and using fseek to move around. But, that was a case needing to bring in pieces of a very large file; a sequential write operation might be pretty quick.
Alternatively, this is a place where a Fortran mex file might be a handy way to approach it -- the Fortran i/o system would handle the type information transparently on an unformatted WRITE.
Which brings up the Q? of if the legacy system would happen to be a Fortran application, is it actually a stream file (which required nonstandard compiler extensions before F2003) or a direct access file which would also have the hidden record lengths in it.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Type Conversion에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by