Live Direction of Arrival Estimation with a Linear Microphone Array
This example shows how to acquire and process live multichannel audio. It also presents a simple algorithm for estimating the Direction Of Arrival (DOA) of a sound source using multiple microphone pairs within a linear array.
Select and Configure the Source of Audio Samples
If a multichannel input audio interface is available, then modify this script to set sourceChoice
to 'live'
. In this mode the example uses live audio input signals. The example assumes all inputs (two or more) are driven by microphones arranged on a linear array. If no microphone array or multichannel audio card is available, then set sourceChoice
to 'recorded'
. In this mode the example uses prerecorded audio samples acquired with a linear array. For sourceChoice = 'live'
, the following code uses audioDeviceReader
to acquire 4 live audio channels through a Microsoft® Kinect™ for Windows®. To use another microphone array setup, ensure the installed audio device driver is one of the conventional types supported by MATLAB® and set the Device
property of audioDeviceReader
accordingly. You can query valid Device
assignments for your computer by calling the getAudioDevices
object function of audioDeviceReader
. Note that even when using Microsoft Kinect, the device name can vary across machines and may not match the one used in this example. Use tab completion to get the correct name on your machine.
sourceChoice = 'recorded';
Set the duration of live processing. Set how many samples per channel to acquire and process each iteration.
endTime = 20; audioFrameLength = 3200;
Create the source.
switch sourceChoice case 'live' fs = 16000; audioInput = audioDeviceReader( ... 'Device','Microphone Array (Microsoft Kinect USB Audio)', ... 'SampleRate',fs, ... 'NumChannels',4, ... 'OutputDataType','double', ... 'SamplesPerFrame',audioFrameLength); case 'recorded' % This audio file holds a 20-second recording of 4 raw audio % channels acquired with a Microsoft Kinect(TM) for Windows(R) in % the presence of a noisy source moving in front of the array % roughly from -40 to about +40 degrees and then back to the % initial position. audioFileName = 'AudioArray-16-16-4channels-20secs.wav'; audioInput = dsp.AudioFileReader( ... 'OutputDataType','double', ... 'Filename',audioFileName, ... 'PlayCount',inf, ... 'SamplesPerFrame',audioFrameLength); fs = audioInput.SampleRate; end
Define Array Geometry
The following values identify the approximate linear coordinates of the 4 built-in microphones of the Microsoft Kinect™ relative to the position of the RGB camera (not used in this example). For 3D coordinates use [[x1;y1;z1], [x2;y2;z2], ..., [xN;yN;zN]]
micPositions = [-0.088, 0.042, 0.078, 0.11];
Form Microphone Pairs
The algorithm used in this example works with pairs of microphones independently. It then combines the individual DOA estimates to provide a single live DOA output. The more pairs available, the more robust (yet computationally expensive) DOA estimation. The maximum number of pairs available can be computed as nchoosek(length(micPositions),2)
. In this case, the 3 pairs with the largest inter-microphone distances are selected. The larger the inter-microphone distance the more sensitive the DOA estimate. Each column of the following matrix describes a choice of microphone pair within the array. All values must be integers between 1 and length(micPositions)
.
micPairs = [1 4; 1 3; 1 2]; numPairs = size(micPairs, 1);
Initialize DOA Visualization
Create an instance of the helper plotting object DOADisplay. This displays the estimated DOA live with an arrow on a polar plot.
DOAPointer = DOADisplay();
Create and Configure the Algorithmic Building Blocks
Use a helper object to rearrange the input samples according to how the microphone pairs are selected.
bufferLength = 64; preprocessor = PairArrayPreprocessor( ... 'MicPositions',micPositions, ... 'MicPairs',micPairs, ... 'BufferLength',bufferLength); micSeparations = getPairSeparations(preprocessor);
The main algorithmic building block of this example is a cross-correlator. That is used in conjunction with an interpolator to ensure a finer DOA resolution. In this simple case it is sufficient to use the same two objects across the different pairs available. In general, however, different channels may need to independently save their internal states and hence to be handled by separate objects.
interpFactor = 8; b = interpFactor * fir1((2*interpFactor*8-1),1/interpFactor); groupDelay = median(grpdelay(b)); interpolator = dsp.FIRInterpolator('InterpolationFactor',interpFactor,'Numerator',b);
Acquire and Process Signals in a Loop
For each iteration of the following while loop: read audioFrameLength
samples for each audio channel, process the data to estimate a DOA value and display the result on a bespoke arrow-based polar visualization.
tic for idx = 1:(endTime*fs/audioFrameLength) cycleStart = toc; % Read a multichannel frame from the audio source % The returned array is of size AudioFrameLength x size(micPositions,2) multichannelAudioFrame = audioInput(); % Rearrange the acquired sample in 4-D array of size % bufferLength x numBuffers x 2 x numPairs where 2 is the number of % channels per microphone pair bufferedFrame = preprocessor(multichannelAudioFrame); % First, estimate the DOA for each pair, independently % Initialize arrays used across available pairs numBuffers = size(bufferedFrame, 2); delays = zeros(1,numPairs); anglesInRadians = zeros(1,numPairs); xcDense = zeros((2*bufferLength-1)*interpFactor, numPairs); % Loop through available pairs for kPair = 1:numPairs % Estimate inter-microphone delay for each 2-channel buffer delayVector = zeros(numBuffers, 1); for kBuffer = 1:numBuffers % Cross-correlate pair channels to get a coarse % crosscorrelation xcCoarse = xcorr( ... bufferedFrame(:,kBuffer,1,kPair), ... bufferedFrame(:,kBuffer,2,kPair)); % Interpolate to increase spatial resolution xcDense = interpolator(flipud(xcCoarse)); % Extract position of maximum, equal to delay in sample time % units, including the group delay of the interpolation filter [~,idxloc] = max(xcDense); delayVector(kBuffer) = ... (idxloc - groupDelay)/interpFactor - bufferLength; end % Combine DOA estimation across pairs by selecting the median value delays(kPair) = median(delayVector); % Convert delay into angle using the microsoft pair spatial % separations provided anglesInRadians(kPair) = HelperDelayToAngle(delays(kPair), fs, ... micSeparations(kPair)); end % Combine DOA estimation across pairs by keeping only the median value DOAInRadians = median(anglesInRadians); % Arrow display DOAPointer(DOAInRadians) % Delay cycle execution artificially if using recorded data if(strcmp(sourceChoice,'recorded')) pause(audioFrameLength/fs - toc + cycleStart) end end
release(audioInput)