Saving data as binary
이전 댓글 표시
Basically, i have for example k = [0 5 4], i want it to be saved as [0 101 100] instead of [00000000 00000101 00000100] so that it takes the least size possible, how can i do that ?
답변 (2개)
k = [0 5 4];
arrayfun(@(x)dec2bin(x,max(1,ceil(log2(x)))),k,'UniformOutput',false)
댓글 수: 11
Adel Hafri
2022년 5월 14일
Image Analyst
2022년 5월 15일
OK, so? Are you implying that is a problem or is unexpected?
Is essence you have 3 strings with 7 characters so it should be 7 bytes. Plus there is some overhead for using a cell array so it could be more than that.
k has 3 two-byte double numbers, or 6 bytes.
Why do you want strings anyway? What's wrong with the numbers in their original form?
Adel Hafri
2022년 5월 15일
Maybe use uint8 (or uint16, uint32, uint64, depending on the range of your data, or possibly their signed counterparts int8, etc.) instead of character arrays.
Exploring the amount of storage used for various data types:
x = '00000001'; % 1-by-8 character array
whos x % 16 bytes (but could be made 8 using a different encoding)
x = '1'; % scalar character
whos x % 2 bytes (but could be 1 with different encoding)
x = false(1,8); % 1-by-8 logical array. you might think this
x(end) = true; % would be 8 bits, but in fact it's 8 bytes
x
whos x % 8 bytes
x = true % scalar logical
whos x % 1 byte
x = 1; % double-precision floating point number (8 bytes)
whos x % 8 bytes
x = uint8(1); % unsigned 8-bit integer (1 byte)
whos x % 1 byte
x = [0 5 4]; % 3 doubles
whos x % 24 bytes
x = uint8(x); % 3 uint8's
whos x % 3 bytes
By the way, trying to get down to less than one byte, e.g., storing 1 as 1 bit and storing 4 = 100 as 3 bits will make the resulting file impossible to decode. For instance, if your file contains the sequence of bits 1100 somewhere, you would not know whether that should be interpreted as:
- 1100 (i.e., decimal 12), or
- 110, 0 (i.e., decimal 6, 0), or
- 11, 0, 0 (i.e., decimal 3, 0, 0), or
- 1, 100 (i.e., decimal 1, 4), or
- 1, 10, 0 (i.e., decimal 1, 2, 0), or
- 1, 1, 0, 0 (i.e., decimal 1, 1, 0, 0)
All six of those interpretations use the minimum number of bits required for each decimal number (i.e., no leading zeros).
[ The other two possible interpretations:
- 11, 00 (i.e., decimal 3, 0), and
- 1, 1, 00 (i.e., decimal 1, 1, 0)
do not meet the requirement that every number is encoded with the minimum number of bits (i.e., they have leading zeros: decimal 0 is bits 00 instead of bit 0), so they could be ruled out. ]
It's an interesting problem to think about:
Adel Hafri
2022년 5월 15일
Voss
2022년 5월 15일
If 245, 2, 6, and 78 are the only possible numbers you need to encode, then sure, you could encode them like that. I think you'd have to write a MATLAB function to do the encoding yourself, but that wouldn't be too difficult. Is that correct, that you'll only ever have those four numbers?
In any case, I don't think there is a way to write less than 1 byte (e.g., write two bits at a time) to file. You'd have to combine your two-bit symbols in groups of four symbols. So you'd encode [245 2 6 78] to [01 10 11 00], then write to file the concatenation of those 4 two-bit symbols, which is the byte 01101100 (decimal 108, hex 6C).
That way, you could do four symbols per byte. If you have more symbols/numbers to encode, then you'd have to do with fewer symbols per byte. For instance,
- more than 4, up to 16 symbols -> 4 bits per symbol -> 2 symbols per byte
- more than 16, up to 256 symbols -> 8 bits per symbol -> 1 symbol per byte -> use built-in type uint8 (no custom encoding function required)
Adel Hafri
2022년 5월 15일
Walter Roberson
2022년 5월 15일
You can fwrite with 'bit1'. All of the values that you fwrite() in a single call will be packed into consecutive bits, but at the end of the call if you do not happen to be positioned at the end of a byte then enough 0s will be added to reach the byte boundary.
Walter Roberson
2022년 5월 15일
You would typically use the Huffman decoding function to decode the stream of bits, and that decoding function needs to be passed the dictionary.
Adel Hafri
2022년 5월 15일
Walter Roberson
2022년 5월 20일
bits = {[1] [0 0] [1] [0 1 1] }
Bitstream = [bits{:}];
fid = fopen('test.bin','w');
fwrite(fid, Bitstream, 'bit1');
fclose(fid);
Ilya Dikariev
2022년 5월 20일
0 개 추천
k_new=str2num(dec2bin(k))' would do. But if you want to still reduce the the size, just use dec2bin which keeps the data in char type which is 8 times smaller
댓글 수: 1
Walter Roberson
2022년 5월 20일
편집: Walter Roberson
2022년 5월 20일
only 4 times smaller. Each character needs 16 bits.
If you uint8(k_new) then that would need only one byte per value
카테고리
도움말 센터 및 File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
