m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1570}
hide / / print
ref: -0 tags: Balduzzi backprop biologically plausible red-tape date: 05-31-2022 20:48 gmt revision:1 [0] [head]

Kickback cuts Backprop's red-tape: Biologically plausible credit assignment in neural networks

Bit of a meh -- idea is, rather than propagating error signals backwards through a hierarchy, you propagate only one layer + use a signed global reward signal. This works by keeping the network ‘coherent’ -- positive neurons have positive input weights, and negative neurons have negative weights, such that the overall effect of a weight change does not change sign when propagated forward through the network.

This is kind of a lame shortcut, imho, as it limits the types of functions that the network can model & the computational structure of the network. This is already quite limited by the dot-product-rectifier common structure (as is used here). Much more interesting and possibly necessary (given much deeper architectures now) is to allow units to change sign. (Open question as to whether they actually frequently do!). As such, the model is in the vein of "how do we make backprop biologically plausible by removing features / communication" rather than "what sorts of signals and changes does the brain use perceive and generate behavior".

This is also related to the literature on what ResNets do; what are the skip connections for? Amthropic has some interesting analyses for Transformer architectures, but checking the literature on other resnets is for another time.

{1568}
hide / / print
ref: -2021 tags: burst bio plausible gradient learning credit assignment richards apical dendrites date: 05-05-2022 15:44 gmt revision:2 [1] [0] [head]

Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits

  • Roughly, single-events indicate the normal feature responses of neurons, while multiple-spike bursts indicate error signals.
  • Bursts are triggered by depolarizing currents to the apical dendrites, which can be uncoupled from bottom-up event rate, which arises from perisomatic inputs / basal dendrites.
  • The fact that the two are imperfectly multiplexed is OK, as in backprop the magnitude of the error signal is modulated by the activity of the feature detector.
  • "For credit assignment in hierarchical networks, connections should obey four constraints:
    • Feedback must steer the magnitude and sign of plasticity
    • Feedback signals from higher-order areas must be multipleed with feedforward signals from lower-order areas so that credit assignment can percolate down the hierarch with minimal effect on sensory information
    • There should be some form of alignment between feedforward and feedback connections
    • Integration of credit-carrying signals should be nearly linear to avoid saturation
      • Seems it's easy to saturate the burst probability within a window of background event rate, e.g. the window is all bursts to no bursts.
  • Perisomatic inputs were short-term depressing, whereas apical dendrite synapses were short-term facilitating.
    • This is a form of filtering on burst rates? E.g. the propagate better down than up?
  • They experiment with a series of models, one for solving the XOR task, and subsequent for MNIST and CIFAR.
  • The later, larger models are mean-field models, rather than biophysical neuron models, and have a few extra features:
    • Interneurons, presumably SOM neurons, are used to keep bursting within a linear regime via a 'simple' (supplementary) learning rule.
    • Feedback alignment occurs by adjusting both the feedforward and feedback weights with the same propagated error signal + weight decay.
  • The credit assignment problem, or in the case of unsupervised learning, the coordination problem, is very real: how do you change a middle-feature to improve representations in higher (and lower) levels of the hierarchy?
    • They mention that using REINFORCE on the same network was unable to find a solution.
    • Put another way: usually you need to coordinate the weight changes in a network; changing weights individually based on a global error signal (or objective function) does not readily work...
      • Though evolution seems to be quite productive at getting the settings of (very) large sets of interdependent coefficients all to be 'correct' and (sometimes) beautiful.
      • How? Why? Friston's free energy principle? Lol.