이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
Audio compression using DCT - but i get same size of files after inverse DCT
조회 수: 3 (최근 30일)
이전 댓글 표시
Mohamad
2018년 5월 4일
Hi I have a file ( 1.wav) - I'm trying to compress the first two seconds for this audio by using Discrete cosine transform . I attached the code , but when i use the command ( whos ) for the original samples and reconstructed samples after inverse DCT i get the same size and number of bytes So any explanation , and how i get the compression ratio ?
채택된 답변
Walter Roberson
2018년 5월 4일
편집: Walter Roberson
2018년 5월 4일
That is expected. You are writing out the re-expanded data as samples. There will be the same number of samples as before, so it is going to take the same output size (probably.)
See also my recent discussion at https://www.mathworks.com/matlabcentral/answers/398289-how-can-i-do-audio-compression-using-huffman-encoding#comment_563731 . For DCT you would not need to write out a dictionary, but you would not write out the coefficients you had zeroed out. You would, however, need to write out the original number of coefficients so when you read the values in, you knew how many zeros to pad with before reconstruction.
댓글 수: 28
Mohamad
2018년 5월 4일
편집: Mohamad
2018년 5월 4일
Hi - I made DCT for the first 5 seconds of WAV file . I will use only DCT coefficients which contain 99.9 % of energy - set all remaining coefficients to zero . Now I need to Create a Huffman Code Dictionary for these DCT coefficients which have 99.9 % - so how to make this ? Do I need to make Quantization for these DCT coefficients ( i.e. to make symbols) for Huffman encoding ? How I make this ?
Walter Roberson
2018년 5월 4일
The code in https://www.mathworks.com/matlabcentral/fileexchange/34958-jpeg-compression--dct- shows construction of dct coefficients as integer values. You would still set the extra coefficients to 0. And then you would use the set of integer values as the symbols while you follow the steps outlined in the post I linked to.
Mohamad
2018년 5월 4일
편집: Walter Roberson
2018년 5월 4일
Sorry for inconvenience - but the link : https://www.mathworks.com/matlabcentral/fileexchange/34958-jpeg-compression--dct- shows image compression and it uses a normalization matrix - so how i made this on audio file ( one column ) ?
How I construct a stream of 0's and 1's that encode the samples then using huffman encoding ?
Walter Roberson
2018년 5월 4일
For samples to bits:
Use huffmandict() on the samples to build the encoding tables, and then use huffmanenco() to perform the encoding to a stream of 0 and 1 values.
Mohamad
2018년 5월 4일
편집: Mohamad
2018년 5월 4일
I'm trying this . But I get error : The Huffman dictionary provided does not have the codes for all the input signals. I made quantization for the DCT coefficients to be mapped to 32 different levels . I used hist to get the probability vector . I have size of DCT coefficients = 55125 x 1 So how to make the symbols for Huffman Dictionary ?
Walter Roberson
2018년 5월 4일
[filename, pathname] = uigetfile('*.wav', 'pick a file');
if ~ischar(filename); error('no file chosen'); end
[x1,Fs] = audioread(filename);
samples = [1,min(5*Fs, length(x1))];
[x1,Fs] = audioread(filename,samples);
L1=length(x1)
X=dct(x1);
% Sort the coefficients from largest to smallest.
[XX,ind] = sort(abs(X),'descend');
need = 1;
while norm(X(ind(1:need)))/norm(X)<0.9999
need = need+1;
end
Coefficents_need=need
xpc = need/length(X)*100
% Set to zero the coefficients that contain the remaining 0.1% of the energy
X(ind(need+1:end)) = 0;
partition =linspace(min(X),max(X),32);
codebook = linspace(min(X)-1/32,max(X),33); % Length 33, one entry for each interval
[index,quantized] = quantiz(X,partition,codebook); % Quantize.
histogram(quantized,33,'Normalization','probability');
h2 = histc(index+1,1:length(codebook));
p = h2/length(X);
dict = huffmandict(codebook,p);
comp = huffmanenco(quantized,dict);
Mohamad
2018년 5월 4일
I get Warning: Data clipped when writing file Also the compressed file 2_cc.wav which created from dsig = huffmandeco(A,dict); filename = '2_cc.wav'; audiowrite(filename,dsig,Fs); Has the same size of the original file . Also when I play it , it is very noisy
Walter Roberson
2018년 5월 4일
What you get back from huffmandeco is not the original sounds. What you get back is the DCT coefficients. You need to do inverse DCT.
Mohamad
2018년 5월 5일
I closed the binary file after writing the encoded stream . Then I read the binary file - then used huffmandeco - then used inverse DCT -then Audiowrite to make (.WAV) file . Sound reconstructed is ok . But again i see size of original file ( .WAV ) similar to the size of reconstructed audio ( .WAV ) and the size of the binary file is larger than both (.WAV ) files . So where is the compression ?
Walter Roberson
2018년 5월 5일
With regards to the file size: you did not write using ubit1 like I said was needed.
With regards to the "Warning: Data clipped when writing file.":
Once you have quantized the DCT coefficients, if you were to then immediately idct() the quantized coefficients, without having removed any coefficients and without having gone through the huffman and file and huffman decode -- just straight dct, quantize, idct of quantized coefficients -- then it turns out that the range of reconstructed values is not -1 to +1 and instead can be like -2.7 to +3.7. This is a pure effect of quantization with dct, and you are going to need to account for it.
My tests show that the idct of the quantized value can be a factor of 10^4 or more higher than the original signal. The parts that seem to do especially poorly are the parts of the signal that have near silence: the reconstructed values can end up fairly large there (I do not know why that might be so.)
When you zero out the extra coefficients, then the reconstructed value can be about -5 to +4.5 . And remember that it is the places of near silence that are especially badly reconstructed (on relative terms), so this introduces noticeable noise into the reconstruction.
Mohamad
2018년 5월 5일
Hi 1. Do i need to normalize audio samples to be in the range (-1 to 1 ) before making DCT ? 2. Do i need to make DCT on blocks of audio samples instead of the whole length of audio samples ? Thanks
Walter Roberson
2018년 5월 5일
The samples you get from audioread() are already in the range -1 to +1 before you dct(), and if you did not quantize you would recover the same data.
Testing with a sound sample I happened to have, I found that if I increased my dictionary size to 85 or larger that the reconstructed signal was within range.
You do need to ensure that your reconstructed signal is of the correct length: when you read with ubit1 format, you will always get a multiple of 8 samples (bits) back, and chances are that your huffman encoding was not an exact multiple of 8. Those extra bits will cause problems for decoding.
I experimented with adding an extra entry to the dictionary with value inf and with probability 1/(length(x1)+1), making sure that I normalized the other entries by (length(x1)+1) instead of length(x1) . Then on reconstruction I used isinf() to find the inf in the input stream, and I trim out everything from that point on. This turned out to work just fine.
Mohamad
2018년 5월 5일
Hi - I write binary file using ubit1 . No more warning for Data clippling . I still using Quantization for DCT coefficients . I increased Dictionary to 100 . The binary file size on Disk is around 24KB . I play the sound , still noisy in the background . Why did you add extra entry to the Dictionary with value inf ? How I add this ?
Mohamad
2018년 5월 5일
I also try dct - quantize dct , idct of quantized coefficients . I get idct values in range -0.4353 to 0.3361 .
Walter Roberson
2018년 5월 5일
160044/23522 is about 6.8 which is decent compression.
My tests show that the main way to reduce noise on playback is to use a higher number of dictionary entries.
A lot of the dictionary entries turn out to be unused or barely used, so the main effect of using more dictionary entries is to provide a higher resolution on the entries that are used.
Also, if you were properly handling dictionary entries by writing them to the binary file and restoring them from the file (the binary file should have all of the information needed to recover the sound), then using more entries could raise the size of the compressed file -- which is a standard trade-off in lossy compression, that the better quality you want, the larger the file size needs to be.
Mohamad
2018년 5월 6일
편집: Mohamad
2018년 5월 6일
Hi - Please . I made quantization using 512 . So now I have 512 entries for the Dictionary But still I have noise in background . But i noticed by using more Quantization - leads to more compression ratio ( i.e improvement ) . So any means to reduce noise ? Do I need to use more quantization levels ? ( but processing becomes slower ) . I'm only writing the 0's and 1's from huffmanenco - so how i write these 0's and 1's and add all information needed to reconstruct Audio ? I'm using huffmandeco to decode 0's and 1's - so is this decoding don't have all information to reconstruct audio ? If I'm not going to Quantize DCT coefficients - then how to make the Dictionary table for Huffman ?
Walter Roberson
2018년 5월 6일
The only way to avoid having any background noise is either have perfect reconstruction, or else to filter out the high frequency after reconstruction.
For perfect reconstruction you would not quantize and you would not zero out any coefficients. If you quantize or if you zero out coefficients (or both, as you do) then you are certain to get noise. The question becomes how much noise is acceptable. The more partition entries you use, the lower the noise.
Mohamad
2018년 5월 6일
Hi Please , again i got warning of Data clipped - although i’m using 256 quantization levels . Althpugh the inverse DCT are in the range -1 to 1 So how i overcome this warning ? If this warning due to Quantization - how i make Huffman Dictionary with all these DCT coefficients ? Which isvery large number of coefficients . Thanks
Walter Roberson
2018년 5월 6일
The greatest source of noise with that many coefficients is that you are doing the idct of the full dsig, which is the result of the huffmandeco on the data read in as ubit1 . As I described to you before, when you read using ubit1, a full byte is read at the end, leaving you with up to 7 extra 0 bits at the end. When you do the huffman decoding, those 7 extra 0 are likely to turn into one or more extra data samples in dsig. Those extra data samples affect the reconstruction audibly.
You need to figure out some way of ensuring that you extract the same length of signal from the huffman decoding as you put into the huffman encoding. I already described one method to you: add a distinct "end of stream" data element, and after decoding, detect that marker and remove from there onward. Another way to handle the situation is to write the length as part of the binary file.
The second greatest source of noise is the zeroing of the low-energy coefficients.
It takes a lot of dictionary entries to counter-act the effect of zeroing the low-energy coefficients. There seems to be an RMS limit of about 1.86 when the coefficients are zeroed, where-as with the coefficients not zeroed, you can get down to about 0.38 with 512 coefficients.
I am still testing what you can do with more coefficients. It turns out that the internal routines that validate the dictionary are inefficient, involving operations proportional to the square of the number of entries, so there are practical limits in how far out you can test.
Mohamad
2018년 5월 7일
Hi - I get Error using huffmandeco : The encoded signal contains a code which is not present in the dictionary I'm using all the DCT coefficients without zeroing , I checked length of dictionary length_dict = 200 length_comp = 115845 count1 = 115845 length_A = 231696 But why the length of A not equal to length of comp - I get around double length . How I modify the code to extract the same length of signal from the huffman decoding as you put into the huffman encoding ? Thanks
Walter Roberson
2018년 5월 7일
I will look at this after I get up; it is my bedtime now (5 in the morning!)
Mohamad
2018년 5월 7일
Hi - please how i add one distinct bit at the end of sream and detect it ? I tried to add inf to the codebook with probability 1/length(x+1) But i got error : sum of probability must equal to one . Thanks
Mohamad
2018년 5월 9일
편집: Walter Roberson
2018년 5월 9일
Hi - please I added inf to the dict , but when used isinf(A) i get 0 - I don't know why
Also I get :
length(quantized) = 80000
length(comp) = 273938
length(A) = 273944
length(dsig) = 81935
So why A is not the same length as comp ?
Hos I make dsig length = 80000 ?
Some times i get I still get data clip warning .
W=I don't know why i'
Walter Roberson
2018년 5월 9일
I do it like this:
p = h2/(L1+1);
%code end of file as infinity
dict = huffmandict([codebook, inf],[p; 1/L1]);
comp = huffmanenco([quantized, inf], dict);
[...]
dsig = huffmandeco(A,dict);
eofpos = find(isinf(dsig), 1, 'first');
if ~isempty(eofpos); dsig(eofpos:end) = []; end
추가 답변 (1개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Source Coding에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)
