量子化、投影、および枝刈り

量子化、投影、または枝刈りの実行による深層ニューラルネットワークの圧縮

Deep Learning Toolbox™ を Deep Learning Toolbox Model Quantization Library サポートパッケージと共に使用し、以下を行うことで、深層ニューラルネットワークのメモリフットプリントの削減と計算要件の緩和を行います。

1 次テイラー近似を使用して畳み込み層からフィルターを枝刈りします。その後、この枝刈りされたネットワークから、C/C++ コードまたは CUDA^® コードを生成できます。
層の活性化に対し、学習データの典型的なデータセットを使用して主成分分析 (PCA) を実行して層を投影し、層の学習可能なパラメーターに対して線形投影を適用します。投影された深層ニューラルネットワークのフォワードパスは、通常、ライブラリを使用せずに C/C++ コードを生成して組み込みハードウェアにネットワークを展開すると、より高速になります。
層の重み、バイアス、および活性化を、低い精度にスケーリングされた整数データ型に量子化します。その後、この量子化ネットワークから、C/C++ コード、CUDA コード、または HDL コードを生成できます。
このソフトウェアは、C/C++ および CUDA のコード生成のため、畳み込み層の重み、バイアス、および活性化を、8 ビットにスケーリングされた整数データ型に量子化することで、畳み込み深層ニューラルネットワーク用のコードを生成します。この量子化は、関数 calibrate によって生成されたキャリブレーション結果ファイルを codegen (MATLAB Coder) コマンドに渡すことで行われます。
コード生成では、関数 quantize によって生成された量子化済みの深層ニューラルネットワークがサポートされません。

関数

すべて展開する

枝刈り

`taylorPrunableNetwork`	Network that can be pruned by using first-order Taylor approximation (R2022a 以降)
`forward`	学習用の深層学習ネットワーク出力の計算 (R2019b 以降)
`predict`	推論用の深層学習ネットワーク出力の計算 (R2019b 以降)
`updatePrunables`	Remove filters from prunable layers based on importance scores (R2022a 以降)
`updateScore`	Compute and accumulate Taylor-based importance scores for pruning (R2022a 以降)
`dlnetwork`	カスタム学習ループ向けの深層学習ネットワーク (R2019b 以降)

投影

`compressNetworkUsingProjection`	Compress neural network using projection (R2022b 以降)
`neuronPCA`	Principal component analysis of neuron activations (R2022b 以降)
`unpackProjectedLayers`	Unpack projected layers of neural network (R2023b 以降)
`ProjectedLayer`	Compressed neural network layer using projection (R2023b 以降)
`gruProjectedLayer`	Gated recurrent unit (GRU) projected layer for recurrent neural network (RNN) (R2023b 以降)
`lstmProjectedLayer`	Long short-term memory (LSTM) projected layer for recurrent neural network (RNN) (R2022b 以降)

量子化

`dlquantizer`	Quantize a deep neural network to 8-bit scaled integer data types (R2020a 以降)
`dlquantizationOptions`	Options for quantizing a trained deep neural network (R2020a 以降)
`calibrate`	深層ニューラルネットワークのシミュレーションと範囲の収集 (R2020a 以降)
`quantize`	Quantize deep neural network (R2022a 以降)
`validate`	Quantize and validate a deep neural network (R2020a 以降)
`quantizationDetails`	Display quantization details for a neural network (R2022a 以降)
`estimateNetworkMetrics`	Estimate network metrics for specific layers of a neural network (R2022a 以降)
`equalizeLayers`	Equalize layer parameters of deep neural network (R2022b 以降)

アプリ

ディープネットワーク量子化器

Quantize deep neural network to 8-bit scaled integer data types (R2020a 以降)

トピック

枝刈り

イメージ分類ネットワークのパラメーターの枝刈りと量子化
パラメーターの枝刈りと量子化を行ってネットワークのサイズを小さくする。
Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning. By using the taylorPrunableNetwork function to remove convolution layer filters, you can reduce the overall network size and increase the inference speed.
Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.
Prune and Quantize Convolutional Neural Network for Speech Recognition
This example shows how to compress a convolutional neural nework (CNN) to prepare it for deployment on an embedded system.

投影と知識蒸留

Compress Neural Network Using Projection
This example shows how to compress a neural network using projection and principal component analysis.
Compress Network for Estimating Battery State of Charge
This example shows how to compress a neural network for predicting the state of charge of a battery using projection and principal component analysis. (R2023b 以降)
Train Smaller Neural Network Using Knowledge Distillation
This example shows how to reduce the memory footprint of a deep learning network by using knowledge distillation. (R2023b 以降)

量子化

深層ニューラルネットワークの量子化
量子化の影響とネットワーク畳み込み層のダイナミックレンジの可視化方法を学習します。
量子化ワークフローの必要条件
深層学習ネットワークの量子化に必要な製品。
Prepare Data for Quantizing Networks
Supported datastores for quantization workflows.

GPU ターゲットの量子化

深層学習ネットワーク用の INT8 コードの生成 (GPU Coder)
事前学習済み畳み込みニューラルネットワークを量子化してコードを生成します。
イメージ分類用の学習済み残差ネットワークの量子化と CUDA コードの生成
この例では、残差結合をもち、イメージ分類用に CIFAR-10 データで学習させた深層学習ニューラルネットワークの畳み込み層で、学習可能パラメーターを量子化する方法を示します。
オブジェクト検出器の層の量子化と CUDA コードの生成
この例では、畳み込み層に対して 8 ビット整数で推論計算を実行する SSD 車両検出器および YOLO v2 車両検出器の CUDA® コードを生成する方法を示します。
Quantize Semantic Segmentation Network and Generate CUDA Code
Quantize Convolutional Neural Network Trained for Semantic Segmentation and Generate CUDA Code

FPGA ターゲットの量子化

Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Quantization Library and Deep Learning HDL Toolbox to deploy the int8 network to a target FPGA board.
Classify Images on FPGA Using Quantized Neural Network (Deep Learning HDL Toolbox)
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA. In the example you use the pretrained ResNet-18 CNN to perform transfer learning and quantization. You then deploy the quantized network and use MATLAB ® to retrieve the prediction results.
Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image. The example uses the pretrained GoogLeNet network to demonstrate transfer learning, quantization, and deployment for the quantized network. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results.

CPU ターゲットの量子化

深層学習ネットワークの int8 コードの生成 (MATLAB Coder)
事前学習済みの畳み込みニューラルネットワークを量子化してコードを生成する。
Raspberry Pi での深層学習ネットワークの INT8 コードの生成 (MATLAB Coder)
8 ビット整数で推論計算を実行する深層学習ネットワークのコードを生成する。

注目の例

Prune Image Classification Network Using Taylor Scores

Reduce the size of a deep neural network using Taylor pruning. By using the taylorPrunableNetwork function to remove convolution layer filters, you can reduce the overall network size and increase the inference speed.

ライブスクリプトを開く

Prune Filters in a Detection Network Using Taylor Scores

Reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

ライブスクリプトを開く

Compress Neural Network Using Projection

Compress a neural network using projection and principal component analysis.

ライブスクリプトを開く

イメージ分類用の学習済み残差ネットワークの量子化と CUDA コードの生成

この例では、残差結合をもち、イメージ分類用に CIFAR-10 データで学習させた深層学習ニューラルネットワークの畳み込み層で、学習可能パラメーターを量子化する方法を示します。

ライブスクリプトを開く

Prune and Quantize Semantic Segmentation Network

Reduce the memory footprint of a semantic segmentation network and speed-up inference by compressing the network using pruning and quantization.

ライブスクリプトを開く

Explore Quantized Semantic Segmentation Network Using Grad-CAM

Compare the predictions of a quantized semantic segmentation network to the original network using the gradient-weighted class activation mapping (Grad-CAM) interpretability method.

ライブスクリプトを開く