Audio compression using DCT - but i get same size of files after inverse DCT

조회 수: 3 (최근 30일)

Mohamad 2018년 5월 4일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/398982-audio-compression-using-dct-but-i-get-same-size-of-files-after-inverse-dct

댓글: Abid Ali 2020년 4월 30일

my_Audio2.m

Hi I have a file ( 1.wav) - I'm trying to compress the first two seconds for this audio by using Discrete cosine transform . I attached the code , but when i use the command ( whos ) for the original samples and reconstructed samples after inverse DCT i get the same size and number of bytes So any explanation , and how i get the compression ratio ?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

채택된 답변

Walter Roberson 2018년 5월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/398982-audio-compression-using-dct-but-i-get-same-size-of-files-after-inverse-dct#answer_318619

편집: Walter Roberson 2018년 5월 4일

That is expected. You are writing out the re-expanded data as samples. There will be the same number of samples as before, so it is going to take the same output size (probably.)

See also my recent discussion at https://www.mathworks.com/matlabcentral/answers/398289-how-can-i-do-audio-compression-using-huffman-encoding#comment_563731 . For DCT you would not need to write out a dictionary, but you would not write out the coefficients you had zeroed out. You would, however, need to write out the original number of coefficients so when you read the values in, you knew how many zeros to pad with before reconstruction.

댓글 수: 28
이전 댓글 26개 표시이전 댓글 26개 숨기기

Walter Roberson 2018년 5월 5일

With regards to the file size: you did not write using ubit1 like I said was needed.

With regards to the "Warning: Data clipped when writing file.":

Once you have quantized the DCT coefficients, if you were to then immediately idct() the quantized coefficients, without having removed any coefficients and without having gone through the huffman and file and huffman decode -- just straight dct, quantize, idct of quantized coefficients -- then it turns out that the range of reconstructed values is not -1 to +1 and instead can be like -2.7 to +3.7. This is a pure effect of quantization with dct, and you are going to need to account for it.

My tests show that the idct of the quantized value can be a factor of 10^4 or more higher than the original signal. The parts that seem to do especially poorly are the parts of the signal that have near silence: the reconstructed values can end up fairly large there (I do not know why that might be so.)

When you zero out the extra coefficients, then the reconstructed value can be about -5 to +4.5 . And remember that it is the places of near silence that are especially badly reconstructed (on relative terms), so this introduces noticeable noise into the reconstruction.

Walter Roberson 2018년 5월 5일

The samples you get from audioread() are already in the range -1 to +1 before you dct(), and if you did not quantize you would recover the same data.

Testing with a sound sample I happened to have, I found that if I increased my dictionary size to 85 or larger that the reconstructed signal was within range.

You do need to ensure that your reconstructed signal is of the correct length: when you read with ubit1 format, you will always get a multiple of 8 samples (bits) back, and chances are that your huffman encoding was not an exact multiple of 8. Those extra bits will cause problems for decoding.

I experimented with adding an extra entry to the dictionary with value inf and with probability 1/(length(x1)+1), making sure that I normalized the other entries by (length(x1)+1) instead of length(x1) . Then on reconstruction I used isinf() to find the inf in the input stream, and I trim out everything from that point on. This turned out to work just fine.

Walter Roberson 2018년 5월 6일

The greatest source of noise with that many coefficients is that you are doing the idct of the full dsig, which is the result of the huffmandeco on the data read in as ubit1 . As I described to you before, when you read using ubit1, a full byte is read at the end, leaving you with up to 7 extra 0 bits at the end. When you do the huffman decoding, those 7 extra 0 are likely to turn into one or more extra data samples in dsig. Those extra data samples affect the reconstruction audibly.

You need to figure out some way of ensuring that you extract the same length of signal from the huffman decoding as you put into the huffman encoding. I already described one method to you: add a distinct "end of stream" data element, and after decoding, detect that marker and remove from there onward. Another way to handle the situation is to write the length as part of the binary file.

The second greatest source of noise is the zeroing of the low-energy coefficients.

It takes a lot of dictionary entries to counter-act the effect of zeroing the low-energy coefficients. There seems to be an RMS limit of about 1.86 when the coefficients are zeroed, where-as with the coefficients not zeroed, you can get down to about 0.38 with 512 coefficients.

I am still testing what you can do with more coefficients. It turns out that the internal routines that validate the dictionary are inefficient, involving operations proportional to the square of the number of entries, so there are practical limits in how far out you can test.