오디오 처리

오디오 및 음성 처리 응용 분야에서 딥러닝 워크플로 확장

Deep Learning Toolbox™를 Audio Toolbox™와 함께 사용하여 오디오 및 음성 처리 응용 분야에 딥러닝을 적용합니다. 신호 처리 응용 분야에 대해서는 신호 처리 항목을 참조하십시오. 무선 통신 응용 분야에 대해서는 무선 통신 항목을 참조하십시오.

앱

신호 레이블 지정기

관심 있는 신호 특성, 신호 영역 및 신호 지점에 레이블 지정하기

함수

모두 확장

데이터 관리 및 증강

`audioDatastore`	Datastore for collection of audio files
`audioDataAugmenter`	Augment audio data

특징 추출

`audioFeatureExtractor`	Streamline audio feature extraction
`openl3Embeddings`	Extract OpenL3 feature embeddings (R2022a 이후)
`pitchnn`	Estimate pitch with deep learning neural network
`vggishEmbeddings`	Extract VGGish feature embeddings (R2022a 이후)

사전 훈련된 신경망

`audioPretrainedNetwork`	Pretrained audio neural networks (R2024a 이후)
`classifySound`	Classify sounds in audio signal
`pitchnn`	Estimate pitch with deep learning neural network
`vggishEmbeddings`	Extract VGGish feature embeddings (R2022a 이후)
`openl3Embeddings`	Extract OpenL3 feature embeddings (R2022a 이후)
`detectspeechnn`	Detect boundaries of speech in audio signal using AI (R2023a 이후)
`separateSpeakers`	Separate signal by speakers (R2023b 이후)

블록

모두 확장

VGGish

VGGish	VGGish embeddings extraction network (R2022a 이후)
VGGish Embeddings	Extract VGGish embeddings (R2022a 이후)

YAMNet

YAMNet	YAMNet sound classification network (R2021b 이후)
Sound Classifier	Classify sounds in audio signal (R2021b 이후)

OpenL3

OpenL3	OpenL3 embeddings extraction network (R2022b 이후)
OpenL3 Embeddings	Extract OpenL3 embeddings (R2022b 이후)

CREPE

CREPE	CREPE deep pitch estimation neural network (R2023a 이후)
Deep Pitch Estimator	Estimate pitch with CREPE deep learning neural network (R2023a 이후)

도움말 항목

Deep Learning for Audio Applications (Audio Toolbox)
Learn common tools and workflows to apply deep learning to audio applications.
딥러닝을 사용하여 사운드 분류하기 (Audio Toolbox)
사운드를 분류하기 위해 간단한 장단기 기억(LSTM)을 훈련, 검증 및 테스트합니다.
Adapt Pretrained Audio Network for New Data Using Deep Network Designer
This example shows how to interactively adapt a pretrained network to classify new audio signals using Deep Network Designer.
Audio Transfer Learning Using Experiment Manager
Configure an experiment that compares the performance of multiple pretrained networks applied to a speech command recognition task using transfer learning.
Compare Speaker Separation Models
Compare the performance, size, and speed of multiple deep learning speaker separation models.
Speaker Identification Using Custom SincNet Layer and Deep Learning
Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.
Dereverberate Speech Using Deep Learning Networks
Train a deep learning model that removes reverberation from speech.
Sequential Feature Selection for Audio Features
This example shows a typical workflow for feature selection applied to the task of spoken digit recognition.
Train Spoken Digit Recognition Network Using Out-of-Memory Audio Data
This example trains a spoken digit recognition network on out-of-memory audio data using a transformed datastore.
Train Spoken Digit Recognition Network Using Out-of-Memory Features
This example trains a spoken digit recognition network on out-of-memory auditory spectrograms using a transformed datastore.
Investigate Audio Classifications Using Deep Learning Interpretability Techniques
This example shows how to use interpretability techniques to investigate the predictions of a deep neural network trained to classify audio data.
Accelerate Audio Deep Learning Using GPU-Based Feature Extraction
Leverage GPUs for feature extraction to decrease the time required to train an audio deep learning model.
AI for Speech Command Recognition (Audio Toolbox)

Build, train, compress, and deploy a deep learning model for speech command recognition.
- 단계 1: 음성 명령 인식을 위한 딥러닝 신경망 훈련시키기 (Audio Toolbox)
- 단계 2: Prune and Quantize Speech Command Recognition Network (Audio Toolbox)
- 단계 3: Apply Speech Command Recognition Network in Simulink (Audio Toolbox)
- 단계 4: Apply Speech Command Recognition Network in Smart Speaker Simulink Model (Audio Toolbox)
- 단계 5: Deploy Smart Speaker Model on Raspberry Pi (Audio Toolbox)

추천 예제

Compress Machine Fault Recognition Neural Network Using Projection

Compress a pretrained acoustics-based machine fault recognition neural network using projection and principal component analysis.

라이브 스크립트 열기

Audio-Based Anomaly Detection for Machine Health Monitoring

Design an autoencoder neural network to perform anomaly detection for machine sounds using unsupervised learning.

라이브 스크립트 열기

3-D Speech Enhancement Using Trained Filter and Sum Network

Perform speech enhancement using a pretrained filter and sum network (FaSNet) with ambisonic data.

라이브 스크립트 열기

3-D Sound Event Localization and Detection Using Trained Recurrent Convolutional Neural Network

Perform 3-D sound event localization and detection using a pretrained deep learning model.

라이브 스크립트 열기

Speaker Recognition Using x-vectors

Develop an x-vector system to perform speaker recognition.

라이브 스크립트 열기

Speaker Diarization Using x-vectors

Speaker diarization is the process of partitioning an audio signal into segments according to speaker identity. It answers the question "who spoke when" without prior knowledge of the speakers and, depending on the application, without prior knowledge of the number of speakers.

라이브 스크립트 열기

딥러닝을 사용해 음성 명령 인식 모델 훈련시키기

이 예제에서는 오디오에서 음성 명령의 존재 여부를 감지하는 딥러닝 모델을 훈련시키는 방법을 보여줍니다. 이 예제에서는 Speech Commands Dataset[1]을 사용하여 컨벌루션 신경망이 명령 세트를 인식하도록 훈련시킵니다.

라이브 스크립트 열기

Keyword Spotting in Noise Using MFCC and LSTM Networks

Identify a keyword in noisy speech using a deep learning network. In particular, the example uses a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficients (MFCC).

라이브 스크립트 열기

딥러닝 신경망을 사용하여 음성 잡음 제거하기

이 예제에서는 딥러닝 신경망을 사용하여 음성 신호의 잡음을 제거하는 방법을 다룹니다. 이 예제에서는 동일한 작업에 적용된 두 가지 유형의 신경망, 즉 완전 연결 신경망과 컨벌루션 신경망을 비교합니다.

라이브 스크립트 열기

Train Generative Adversarial Network (GAN) for Sound Synthesis

Train and use a generative adversarial network (GAN) to generate sounds.

라이브 스크립트 열기

Voice Activity Detection in Noise Using Deep Learning

In this example, you perform batch and streaming voice activity detection (VAD) in a low SNR environment using a pretrained deep learning model. For details about the model and how it was trained, see Train Voice Activity Detection in Noise Model Using Deep Learning (Audio Toolbox).

라이브 스크립트 열기

Speech Emotion Recognition

Illustrates a simple speech emotion recognition (SER) system using a BiLSTM network. You begin by downloading the data set and then testing the trained network on individual files. The network was trained on a small German-language database [1].

라이브 스크립트 열기

Acoustic Scene Recognition Using Late Fusion

Create a multi-model late fusion system for acoustic scene recognition. The example trains a convolutional neural network (CNN) using mel spectrograms and an ensemble classifier using wavelet scattering. The example uses the TUT dataset for training and evaluation [1].

라이브 스크립트 열기

Train End-to-End Speaker Separation Model

Use an end-to-end deep learning network for speaker-independent speech separation.

라이브 스크립트 열기

Acoustics-Based Machine Fault Recognition

Develop a deep learning model to detect faults in an air compressor and package the system to operate on streaming data.

라이브 스크립트 열기

Audio Event Classification Using TensorFlow Lite on Raspberry Pi

Perform audio event classification on Raspberry Pi^® using the YAMNet pretrained deep neural network from the TensorFlow™ Lite library.

라이브 스크립트 열기

Keyword Spotting in Noise Code Generation on Raspberry Pi

Demonstrates code generation for keyword spotting using a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficient (MFCC) feature extraction on Raspberry Pi®. MATLAB® Coder™ with Deep Learning Support enables the generation of a standalone executable (.elf) file on Raspberry Pi. Communication between MATLAB (.mlx) file and the generated executable file occurs over asynchronous User Datagram Protocol (UDP). The incoming speech signal is displayed using a timescope. A mask is shown as a blue rectangle surrounding spotted instances of the keyword, YES. For more details on MFCC feature extraction and deep learning network training, visit Keyword Spotting in Noise Using MFCC and LSTM Networks (Audio Toolbox).

라이브 스크립트 열기

Speech Command Recognition Code Generation on Desktop

Deploy feature extraction and a convolutional neural network (CNN) for speech command recognition. In this example, the generated code is a MATLAB executable (MEX) function, which is called by a MATLAB script that displays the predicted speech command along with the time domain signal and auditory spectrogram. For details about audio preprocessing and network training, see 음성 명령 인식을 위한 딥러닝 신경망 훈련시키기 (Audio Toolbox).