浅層の多層ニューラルネットワークアーキテクチャ

このトピックでは、典型的な浅層の多層ネットワークのワークフローの一部について説明します。詳細とその他のステップについては、浅層の多層ニューラルネットワークと逆伝播学習を参照してください。

ニューロンモデル (logsig、tansig、purelin)

以下に、R 個の入力を持つ基本ニューロンを示します。各入力は、適切な w で重み付けされます。重み付けされた入力とバイアスの和が、伝達関数 f の入力になります。ニューロンは、その出力を生成するために、任意の微分可能な伝達関数 f を使用できます。

Schematic diagram of a general neuron. The neuron multiplies a input vector p by a weights vector w, sums the result, and applies a bias b. A transfer function f is then applied, generating output a.

多層ネットワークでは多くの場合、対数シグモイド伝達関数 logsig を使用します。

A plot of the log-sigmoid transfer function. For large positive inputs, the output tends to +1. For large negative inputs, the output tends to 0. An input of 0 gives an output of 0.5.

関数 logsig は、ニューロンの正味入力が負の無限大から正の無限大になるとき、0 から 1 の間の値を出力します。

また、多層ネットワークでは、正接シグモイド伝達関数 tansig を使用することもできます。

A plot of the tan-sigmoid transfer function. For large positive inputs, the output tends to +1. For large negative inputs, the output tends to -1. An input of 0 gives an output of 0.

シグモイド出力ニューロンは通常、パターン認識問題に使用され、線形出力ニューロンは関数近似問題に使用されます。線形伝達関数 purelin は次のようになります。

A plot of the linear transfer function. The output scales linearly with the input.

ここで説明した 3 つの伝達関数は、多層ネットワークに最も一般的に使用される伝達関数ですが、必要に応じて他の微分可能な伝達関数を作成して使用することができます。

フィードフォワードニューラルネットワーク

以下の左側に R 個の入力を持つ S 個の logsig ニューロンの単層ネットワークの全体像を、右側に層の略図を示します。

Schematic diagram showing a layer containing S logsig neurons.

フィードフォワードネットワークは多くの場合、複数のシグモイドニューロンから成る 1 つ以上の隠れ層の後に、線形ニューロンから成る出力層が続きます。非線形伝達関数を持つニューロンから成る複数の層によって、ネットワークは入力ベクトルと出力ベクトル間の非線形関係を学習できます。線形出力層は、関数近似 (または非線形回帰) 問題で最もよく使用されます。

一方、ネットワークの出力を制約する場合 (0 と 1 の間など)、出力層はシグモイド伝達関数 (logsig など) を使用する必要があります。これに該当するのは、ネットワークがパターン認識問題に使用される (ネットワークによって判定が行われる) 場合です。

多層ネットワークの場合、層の番号によって重み行列の上付き文字が決まります。2 層の tansig/purelin ネットワークに使用する適切な表記を以下に示します。

A schematic diagram of a network containing two layers. A hidden layer receives an input vector p. The weights of the hidden layer are denoted with a superscript 1. An output layer receives the output of the hidden layer. The weights of the output layer are denoted with a superscript 1.

このネットワークは、一般的な関数の近似を行うために使用できます。隠れ層に十分なニューロンを与えることにより、有限個の不連続部分がある任意の関数をうまく近似することができます。

多層ネットワークのアーキテクチャの定義は以上です。次の節では設計プロセスを説明します。