PMID-18846203[0] A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback
- (from abstract) The resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP.
- This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker.
- STDP is prevalent in the cortex ; however, it requires a second signal:
- Dopamine seems to gate STDP in corticostriatal synapses
- ACh does the same or similar in the cortex. -- see references 8-12
- simple learning rule they use:
- Their notes on the Fetz/Baker experiments: "Adjacent neurons tended to change their firing rate in the same direction, but also differential changes of directions of firing rates of pairs of neurons are reported in [17] (when these differential changes were rewarded). For example, it was shown in Figure 9 of [17] (see also Figure 1 in [19]) that pairs of neurons that were separated by no more than a few hundred microns could be independently trained to increase or decrease their firing rates."
- Their result is actually really simple - there is no 'control' or biofeedback - there is no visual or sensory input, no real computation by the network (at least for this simulation). One neuron is simply reinforced, hence it's firing rate increases.
- Fetz & later Schimdt's work involved feedback and precise control of firing rate; this does not.
- This also does not address the problem that their rule may allow other synapses to forget during reinforcement.
- They do show that exact spike times can be rewarded, which is kinda interesting ... kinda.
- Tried a pattern classification task where all of the information was in the relative spike timings.
- Had to run the pattern through the network 1000 times. That's a bit unrealistic (?).
- The problem with all these algorithms is that they require so many presentations for gradient descent (or similar) to work, whereas biological systems can and do learn after one or a few presentations.
- Next tried to train neurons to classify spoken input
- Audio stimului was processed through a cochlear model
- Maass previously has been able to train a network to perform speaker-independent classification.
- Neuron model does, roughly, seem to discriminate between "one" and "two"... after 2000 trials (each with a presentation of 10 of the same digit utterance). I'm still not all that impressed. Feels like gradient descent / linear regression as per the original LSM.
- A great many derivations in the Methods section... too much to follow.
- Should read refs:
- PMID-16907616[1] Gradient learning in spiking neural networks by dynamic perturbation of conductances.
- PMID-17220510[2] Solving the distal reward problem through linkage of STDP and dopamine signaling.
____References____
[0] Legenstein R, Pecevski D, Maass W, A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.PLoS Comput Biol 4:10, e1000180 (2008 Oct) |
[1] Fiete IR, Seung HS, Gradient learning in spiking neural networks by dynamic perturbation of conductances.Phys Rev Lett 97:4, 048104 (2006 Jul 28) |
[2] Izhikevich EM, Solving the distal reward problem through linkage of STDP and dopamine signaling.Cereb Cortex 17:10, 2443-52 (2007 Oct) |
|