You are not authenticated, login.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -1988 tags: Linsker infomax linear neural network hebbian learning unsupervised date: 08-03-2021 06:12 gmt revision:2 [1] [0] [head]

Self-organizaton in a perceptual network

  • Ralph Linsker, 1988.
  • One of the first (verbose, slightly diffuse) investigations of the properties of linear projection neurons (e.g. dot-product; no non-linearity) to express useful tuning functions.
  • ''Useful' is here information-preserving, in the face of noise or dimensional bottlenecks (like PCA).
  • Starts with Hebbian learning functions, and shows that this + white-noise sensory input + some local topology, you can get simple and complex visual cell responses.
    • Ralph notes that neurons in primate visual cortex are tuned in utero -- prior real-world visual experience! Wow. (Who did these studies?)
    • This is a very minimalistic starting point; there isn't even structured stimuli (!)
    • Single neuron (and later, multiple neurons) are purely feed-forward; author cautions that a lack of feedback is not biologically realistic.
      • Also note that this was back in the Motorola 680x0 days ... computers were not that powerful (but certainly could handle more than 1-2 neurons!)
  • Linear algebra shows that Hebbian synapses cause a linear layer to learn the covariance function of their inputs, QQ , with no dependence on the actual layer activity.
  • When looked at in terms of an energy function, this is equivalent to gradient descent to maximize the layer-output variance.
  • He also hits on:
    • Hopfield networks,
    • PCA,
    • Oja's constrained Hebbian rule δw i<L 2(L 1L 2w i)> \delta w_i \propto &lt; L_2(L_1 - L_2 w_i) &gt; (that is, a quadratic constraint on the weight to make Σw 21\Sigma w^2 \sim 1 )
    • Optimal linear reconstruction in the presence of noise
    • Mutual information between layer input and output (I found this to be a bit hand-wavey)
      • Yet he notes critically: "but it is not true that maximum information rate and maximum activity variance coincide when the probability distribution of signals is arbitrary".
        • Indeed. The world is characterized by very non-Gaussian structured sensory stimuli.
    • Redundancy and diversity in 2-neuron coding model.
    • Role of infomax in maximizing the determinant of the weight matrix, sorta.

One may critically challenge the infomax idea: we very much need to (and do) throw away spurious or irrelevant information in our sensory streams; what upper layers 'care about' when making decisions is certainly relevant to the lower layers. This credit-assignment is neatly solved by backprop, and there are a number 'biologically plausible' means of performing it, but both this and infomax are maybe avoiding the problem. What might the upper layers really care about? Likely 'care about' is an emergent property of the interacting local learning rules and network structure. Can you search directly in these domains, within biological limits, and motivated by statistical reality, to find unsupervised-learning networks?

You'll still need a way to rank the networks, hence an objective 'care about' function. Sigh. Either way, I don't per se put a lot of weight in the infomax principle. It could be useful, but is only part of the story. Otherwise Linsker's discussion is accessible, lucid, and prescient.


hide / / print
ref: -2020 tags: feedback alignment local hebbian learning rules stanford date: 04-22-2021 03:26 gmt revision:0 [head]

Two Routes to Scalable Credit Assignment without Weight Symmetry

This paper looks at five different learning rules, three purely local, and two non-local, to see if they can work as well as backprop in training a deep convolutional net on ImageNet. The local learning networks all feature forward weights W and backward weights B; the forward weights (+ nonlinearities) pass the information to lead to a classification; the backward weights pass the error, which is used to locally adjust the forward weights.

Hence, each fake neuron has locally the forward activation, the backward error (or loss gradient), the forward weight, backward weight, and Hebbian terms thereof (e.g the outer product of the in-out vectors for both forward and backward passes). From these available variables, they construct the local learning rules:

  • Decay (exponentially decay the backward weights)
  • Amp (Hebbian learning)
  • Null (decay based on the product of the weight and local activation. This effects a Euclidean norm on reconstruction.

Each of these serves as a "regularizer term" on the feedback weights, which governs their learning dynamics. In the case of backprop, the backward weights B are just the instantaneous transpose of the forward weights W. A good local learning rule approximates this transpose progressively. They show that, with proper hyperparameter setting, this does indeed work nearly as well as backprop when training a ResNet-18 network.

But, hyperparameter settings don't translate to other network topologies. To allow this, they add in non-local learning rules:

  • Sparse (penalizes the Euclidean norm of the previous layer; gradient is the outer product of the (current layer activation &transpose) * B)
  • Self (directly measures the forward weights and uses them to update the backward weights)

In "Symmetric Alignment", the Self and Decay rules are employed. This is similar to backprop (the backward weights will track the forward ones) with L2 regularization, which is not new. It performs very similarly to backprop. In "Activation Alignment", Amp and Sparse rules are employed. I assume this is supposed to be more biologically plausible -- the Hebbian term can track the forward weights, while the Sparse rule regularizes and stabilizes the learning, such that overall dynamics allow the gradient to flow even if W and B aren't transposes of each other.

Surprisingly, they find that Symmetric Alignment to be more robust to the injection of Gaussian noise during training than backprop. Both SA and AA achieve similar accuracies on the ResNet benchmark. The authors then go on to explain the plausibility of non-local but approximate learning rules with Regression discontinuity design ala Spiking allows neurons to estimate their causal effect.

This is a decent paper,reasonably well written. They thought trough what variables are available to affect learning, and parameterized five combinations that work. Could they have done the full matrix of combinations, optimizing just they same as the metaparameters? Perhaps, but that would be even more work ...

Regarding the desire to reconcile backprop and biology, this paper does not bring us much (if at all) closer. Biological neural networks have specific and local uses for error; even invoking 'error' has limited explanatory power on activity. Learning and firing dynamics, of course of course. Is the brain then just an overbearing mess of details and overlapping rules? Yes probably but that doesn't mean that we human's can't find something simpler that works. The algorithms in this paper, for example, are well described by a bit of linear algebra, and yet they are performant.

hide / / print
ref: -0 tags: neuronal assemblies maass hebbian plasticity simulation austria fMRI date: 02-23-2021 18:49 gmt revision:1 [0] [head]

PMID-32381648 A model for structured information representation in neural networks in the brain

  • Using randomly connected E/I networks, suggests that information can be "bound" together using fast Hebbian STDP.
  • That is, 'assemblies' in higher-level areas reference sensory information through patterns of bidirectional connectivity.
  • These patterns emerge spontaneously following disinihbition of the higher-level areas.
  • Find the results underwhelming, but the discussion is more interesting.
    • E.g. there have been a lot of theoretical and computational-experimental work for how concepts are bound together into symbols or grammars.
    • The referenced fMRI studies are interesting, too: they imply that you can observe the results of structural binding in activity of the superior temporal gyrus.
  • I'm more in favor of dendritic potentials or neuronal up/down states to be a fast and flexible way of maintaining 'symbol membership' --
    • But it's not as flexible as synaptic plasticity, which, obviously, populates the outer product between 'region a' and 'region b' with a memory substrate, thereby spanning the range of plausible symbol-bindings.
    • Inhibitory interneurons can then gate the bindings, per morphological evidence.
    • But but, I don't think anyone has shown that you need protein synthesis for perception, as you do for LTP (modulo AMPAR cycling).
      • Hence I'd argue that localized dendritic potentials can serve as the flexible outer-product 'memory tag' for presence in an assembly.
        • Or maybe they are used primarily for learning, who knows!

hide / / print
ref: -0 tags: nonlinear hebbian synaptic learning rules projection pursuit date: 12-12-2019 00:21 gmt revision:4 [3] [2] [1] [0] [head]

PMID-27690349 Nonlinear Hebbian Learning as a Unifying Principle in Receptive Field Formation

  • Here we show that the principle of nonlinear Hebbian learning is sufficient for receptive field development under rather general conditions.
  • The nonlinearity is defined by the neuron’s f-I curve combined with the nonlinearity of the plasticity function. The outcome of such nonlinear learning is equivalent to projection pursuit [18, 19, 20], which focuses on features with non-trivial statistical structure, and therefore links receptive field development to optimality principles.
  • Δwxh(g(w Tx))\Delta w \propto x h(g(w^T x)) where h is the hebbian plasticity term, and g is the neurons f-I curve (input-output relation), and x is the (sensory) input.
  • The relevant property of natural image statistics is that the distribution of features derived from typical localized oriented patterns has high kurtosis [5,6, 39]
  • Model is a generalized leaky integrate and fire neuron, with triplet STDP

hide / / print
ref: -0 tags: NMDA spike hebbian learning states pyramidal cell dendrites date: 10-03-2018 01:15 gmt revision:0 [head]

PMID-20544831 The decade of the dendritic NMDA spike.

  • NMDA spikes occur in the finer basal, oblique, and tuft dendrites.
  • Typically 40-50 mV, up to 100's of ms in duration.
  • Look similar to cortical up-down states.
  • Permit / form the substrate for spatially and temporally local computation on the dendrites that can enhance the representational or computational repertoire of individual neurons.

hide / / print
ref: Vasilaki-2009.02 tags: associative learning prefrontal cortex model hebbian date: 02-17-2009 03:37 gmt revision:2 [1] [0] [head]

PMID-19153762 Learning flexible sensori-motor mappings in a complex network.

  • Were looking at a task, presented to monkeys over 10 years ago, where two images were presented to the monkeys, and they had to associate left and rightward saccades with both.
  • The associations between saccade direction and image was periodically reversed. Unlike humans, who probably could very quickly change the association, the monkeys required on the order of 30 trials to learn the new association.
  • Interestingly, whenever the monkeys made a mistake, they effectively forgot previous pairings. That is, after an error, the monkeys were as likely to make another error as they were to choose correctly, independent of the number of correct trials preceding the error. Strange!
  • They implement and test reward-modulated hebbian learning (RAH), where:
    • The synaptic weights are changed based on the presynaptic activity, the postsynaptic activity minus the probability of both presynaptic and postsynaptic activity. This 'minus' effect seems similar to that of TD learning?
    • The synaptic weights are soft-bounded,
    • There is a stop-learning criteria, where the weights are not positively updated if the total neuron activity is strongly positive or strongly negative. This allows the network to ultimately obtain perfection (at some point the weights are no longer changed upon reward), and explains some of the asymmetry of the reward / punishment.
  • Their model perhaps does not scale well for large / very complicated tasks... given the presence of only a single reward signal. And the lack of attention / recall? Still, it fits the experimental data quite well.
  • They also note that for all the problems they study, adding more layers to the network does not significantly affect learning - neither the rate nor the eventual performance.

hide / / print
ref: bookmark-0 tags: STDP hebbian learning dopamine reward robot model ISO date: 0-0-2007 0:0 revision:0 [head]


  • idea: have a gating signal for the hebbian learning.
    • pure hebbian learning is unsable; it will lead to endless amplification.
  • method: use a bunch of resonators near sub-critically dampled.
  • application: a simple 2-d robot that learns to seek food. not super interesting, but still good.
  • Uses ISO learning - Isotropic sequence order learning.
  • somewhat related: runbot!