Training neural networks with local error signals
- Arild Nokland and Lars H Eidnes
- Idea is to use one+ supplementary neural networks to measure within-batch matching loss between transformed hidden-layer output and one-hot label data to produce layer-local learning signals (gradients) for improving local representation.
- Hence, no backprop. Error signals are all local, and inter-layer dependencies are not explicitly accounted for (! I think).
- : given a mini-batch of hidden layer activations and a one-hot encoded label matrix ,
- (don't know what F is..)
- is a convolutional neural net (trained how?) 3*3, stride 1, reduces output to 2.
- is the cosine similarity matrix, or correlation matrix, of a mini-batch.
- where W is a weight matrix, dim hidden_size * n_classes.
- Cross-entropy is
- Sim-bio loss: replace with average-pooling and standard-deviation op. Plus one-hot target is replaced with a random transformation of the same target vector.
- Overall loss 99% , 1%
- Despite the unequal weighting, both seem to improve test prediction on all examples.
-
- VGG like network, with dropout and cutout (blacking out square regions of input space), batch size 128.
- Tested on all the relevant datasets: MNIST, Fashion-MNIST, Kuzushiji-MNIST, CIFAR-10, CIFAR-100, STL-10, SVHN.
- Pretty decent review of similarity matching measures at the beginning of the paper; not extensive but puts everything in context.
- See for example non-negative matrix factorization using Hebbian and anti-Hebbian learning in and Chklovskii 2014.
- Emphasis put on biologically realistic learning, including the use of feedback alignment {1423}
- Yet: this was entirely supervised learning, as the labels were propagated back to each layer.
- More likely that biology is setup to maximize available labels (not a new concept).
|