m8ta
use https for features.
text: sort by
tags: modified
type: chronology
[0] Peters J, Schaal S, Reinforcement learning of motor skills with policy gradients.Neural Netw 21:4, 682-97 (2008 May)

{1521}
hide / / print
ref: -2005 tags: dimensionality reduction contrastive gradient descent date: 09-13-2020 02:49 gmt revision:2 [1] [0] [head]

Dimensionality reduction by learning and invariant mapping

  • Raia Hadsell, Sumit Chopra, Yann LeCun
  • Central idea: learn and invariant mapping of the input by minimizing mapped distance (e.g. the distance between outputs) when the samples are categorized as the same (same numbers in MNIST eg), and maximizing mapped distance when the samples are categorized as distant.
    • Two loss functions for same vs different.
  • This is an attraction-repulsion spring analogy.
  • Use gradient descent to change the weights to satisfy these two competing losses.
  • Resulting constitutional neural nets can extract camera pose information from the NORB dataset.
  • Surprising how simple analogies like this, when iterated across a great many samples, pull out intuitively correct invariances.

{651}
hide / / print
ref: Peters-2008.05 tags: Schaal reinforcement learning policy gradient motor primitives date: 02-17-2009 18:49 gmt revision:4 [3] [2] [1] [0] [head]

PMID-18482830[0] Reinforcement learning of motor skills with policy gradients

  • they say that the only way to deal with reinforcement or general-type learning in a high-dimensional policy space defined by parameterized motor primitives are policy gradient methods.
  • article is rather difficult to follow; they do not always provide enough details (for me) to understand exactly what their equations mean. Perhaps this is related to their criticism that others's papers are 'ad-hoc' and not 'statistically motivated'
  • none the less, it seems interesting..
  • their previous paper - Reinforcement learning for Humanoid robotics - maybe slightly easier to understand.

____References____

{652}
hide / / print
ref: notes-0 tags: policy gradient reinforcement learning aibo walk optimization date: 12-09-2008 17:46 gmt revision:0 [head]

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

  • simple, easy to understand policy gradient method! many papers cite this on google scholar.
  • compare to {651}

{72}
hide / / print
ref: abstract-0 tags: tlh24 error signals in the cortex and basal ganglia reinforcement_learning gradient_descent motor_learning date: 0-0-2006 0:0 revision:0 [head]

Title: Error signals in the cortex and basal ganglia.

Abstract: Numerous studies have found correlations between measures of neural activity, from single unit recordings to aggregate measures such as EEG, to motor behavior. Two general themes have emerged from this research: neurons are generally broadly tuned and are often arrayed in spatial maps. It is hypothesized that these are two features of a larger hierarchal structure of spatial and temporal transforms that allow mappings to procure complex behaviors from abstract goals, or similarly, complex sensory information to produce simple percepts. Much theoretical work has proved the suitability of this organization to both generate behavior and extract relevant information from the world. It is generally agreed that most transforms enacted by the cortex and basal ganglia are learned rather than genetically encoded. Therefore, it is the characterization of the learning process that describes the computational nature of the brain; the descriptions of the basis functions themselves are more descriptive of the brain’s environment. Here we hypothesize that learning in the mammalian brain is a stochastic maximization of reward and transform predictability, and a minimization of transform complexity and latency. It is probable that the optimizations employed in learning include both components of gradient descent and competitive elimination, which are two large classes of algorithms explored extensively in the field of machine learning. The former method requires the existence of a vectoral error signal, while the latter is less restrictive, and requires at least a scalar evaluator. We will look for the existence of candidate error or evaluator signals in the cortex and basal ganglia during force-field learning where the motor error is task-relevant and explicitly provided to the subject. By simultaneously recording large populations of neurons from multiple brain areas we can probe the existence of error or evaluator signals by measuring the stochastic relationship and predictive ability of neural activity to the provided error signal. From this data we will also be able to track dependence of neural tuning trajectory on trial-by-trial success; if the cortex operates under minimization principles, then tuning change will have a temporal relationship to reward. The overarching goal of this research is to look for one aspect of motor learning – the error signal – with the hope of using this data to better understand the normal function of the cortex and basal ganglia, and how this normal function is related to the symptoms caused by disease and lesions of the brain.