Live Direction of Arrival Estimation with a Linear Microphone Array

Open Live Script

This example shows how to acquire and process live multichannel audio. It also presents a simple algorithm for estimating the Direction Of Arrival (DOA) of a sound source using multiple microphone pairs within a linear array.

Select and Configure the Source of Audio Samples

If a multichannel input audio interface is available, then modify this script to set sourceChoice to 'live'. In this mode the example uses live audio input signals. The example assumes all inputs (two or more) are driven by microphones arranged on a linear array. If no microphone array or multichannel audio card is available, then set sourceChoice to 'recorded'. In this mode the example uses prerecorded audio samples acquired with a linear array. For sourceChoice = 'live', the following code uses audioDeviceReader to acquire 4 live audio channels through a Microsoft® Kinect™ for Windows®. To use another microphone array setup, ensure the installed audio device driver is one of the conventional types supported by MATLAB® and set the Device property of audioDeviceReader accordingly. You can query valid Device assignments for your computer by calling the getAudioDevices object function of audioDeviceReader. Note that even when using Microsoft Kinect, the device name can vary across machines and may not match the one used in this example. Use tab completion to get the correct name on your machine.

sourceChoice = 'recorded';

Set the duration of live processing. Set how many samples per channel to acquire and process each iteration.

endTime = 20;
audioFrameLength = 3200;

Create the source.

switch sourceChoice
    case 'live'
        fs = 16000;
        audioInput = audioDeviceReader( ...
            'Device','Microphone Array (Microsoft Kinect USB Audio)', ...
            'SampleRate',fs, ...
            'NumChannels',4, ...
            'OutputDataType','double', ...
            'SamplesPerFrame',audioFrameLength);
    case 'recorded'    
        % This audio file holds a 20-second recording of 4 raw audio
        % channels acquired with a Microsoft Kinect for Windows in
        % the presence of a noisy source moving in front of the array
        % roughly from -40 to about +40 degrees and then back to the
        % initial position.
        audioFileName = 'AudioArray-16-16-4channels-20secs.wav';
        audioInput = dsp.AudioFileReader( ...
            'OutputDataType','double', ...
            'Filename',audioFileName, ...
            'PlayCount',inf, ...
            'SamplesPerFrame',audioFrameLength);
        fs = audioInput.SampleRate;
end

Define Array Geometry

The following values identify the approximate linear coordinates of the 4 built-in microphones of the Microsoft Kinect relative to the position of the RGB camera (not used in this example). For 3D coordinates use [[x1;y1;z1], [x2;y2;z2], ..., [xN;yN;zN]].

micPositions = [-0.088, 0.042, 0.078, 0.11];

Form Microphone Pairs

The algorithm used in this example works with pairs of microphones independently. It then combines the individual DOA estimates to provide a single live DOA output. The more pairs available, the more robust (yet computationally expensive) DOA estimation. The maximum number of pairs available can be computed as nchoosek(length(micPositions),2). In this case, the 3 pairs with the largest inter-microphone distances are selected. The larger the inter-microphone distance the more sensitive the DOA estimate. Each column of the following matrix describes a choice of microphone pair within the array. All values must be integers between 1 and length(micPositions).

micPairs = [1 4; 1 3; 1 2];
numPairs = size(micPairs, 1);

Initialize DOA Visualization

Create an instance of the helper plotting object DOADisplay. This displays the estimated DOA live with an arrow on a polar plot.

DOAPointer = DOADisplay();

Create and Configure the Algorithmic Building Blocks

Use a helper object to rearrange the input samples according to how the microphone pairs are selected.

bufferLength = 64;
preprocessor = PairArrayPreprocessor( ...
    'MicPositions',micPositions, ...
    'MicPairs',micPairs, ...
    'BufferLength',bufferLength);
micSeparations = getPairSeparations(preprocessor);

The main algorithmic building block of this example is a cross-correlator. That is used in conjunction with an interpolator to ensure a finer DOA resolution. In this simple case it is sufficient to use the same two objects across the different pairs available. In general, however, different channels may need to independently save their internal states and hence to be handled by separate objects.

interpFactor = 8;
b = interpFactor * fir1((2*interpFactor*8-1),1/interpFactor);
groupDelay = median(grpdelay(b));
interpolator = dsp.FIRInterpolator('InterpolationFactor',interpFactor,'Numerator',b);

Acquire and Process Signals in a Loop

For each iteration of the following while loop: read audioFrameLength samples for each audio channel, process the data to estimate a DOA value and display the result on a bespoke arrow-based polar visualization.

tic
for idx = 1:(endTime*fs/audioFrameLength)
    cycleStart = toc;
    % Read a multichannel frame from the audio source
    % The returned array is of size AudioFrameLength x size(micPositions,2)
    multichannelAudioFrame = audioInput();
    
    % Rearrange the acquired sample in 4-D array of size
    % bufferLength x numBuffers x 2 x numPairs where 2 is the number of
    % channels per microphone pair
    bufferedFrame = preprocessor(multichannelAudioFrame);
    
    % First, estimate the DOA for each pair, independently
    
    % Initialize arrays used across available pairs
    numBuffers = size(bufferedFrame, 2);
    delays = zeros(1,numPairs);
    anglesInRadians = zeros(1,numPairs);
    xcDense = zeros((2*bufferLength-1)*interpFactor, numPairs);
    
    % Loop through available pairs
    for kPair = 1:numPairs
        % Estimate inter-microphone delay for each 2-channel buffer 
        delayVector = zeros(numBuffers, 1);
        for kBuffer = 1:numBuffers
            % Cross-correlate pair channels to get a coarse
            % crosscorrelation
            xcCoarse = xcorr( ...
                bufferedFrame(:,kBuffer,1,kPair), ...
                bufferedFrame(:,kBuffer,2,kPair));

            % Interpolate to increase spatial resolution
            xcDense = interpolator(flipud(xcCoarse));

            % Extract position of maximum, equal to delay in sample time
            % units, including the group delay of the interpolation filter
            [~,idxloc] = max(xcDense);
            delayVector(kBuffer) = ...
                (idxloc - groupDelay)/interpFactor - bufferLength;
        end

        % Combine DOA estimation across pairs by selecting the median value
        delays(kPair) = median(delayVector);

        % Convert delay into angle using the Microsoft pair spatial
        % separations provided
        anglesInRadians(kPair) = HelperDelayToAngle(delays(kPair), fs, ...
            micSeparations(kPair));
    end

    % Combine DOA estimation across pairs by keeping only the median value
    DOAInRadians = median(anglesInRadians);
    
    % Arrow display
    DOAPointer(DOAInRadians)
    
    % Delay cycle execution artificially if using recorded data
    if(strcmp(sourceChoice,'recorded'))
        pause(audioFrameLength/fs - toc + cycleStart)
    end
end

Figure contains an axes object. The hidden axes object contains an object of type line.

release(audioInput)