You are not authenticated, login.
text: sort by
tags: modified
type: chronology
[0] Bar-Gad I, Morris G, Bergman H, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia.Prog Neurobiol 71:6, 439-73 (2003 Dec)

hide / / print
ref: -2008 tags: t-SNE dimensionality reduction embedding Hinton date: 01-25-2022 20:39 gmt revision:2 [1] [0] [head]

“Visualizing data using t-SNE”

  • Laurens van der Maaten, Geoffrey Hinton.
  • SNE: stochastic neighbor embedding, Hinton 2002.
  • Idea: model the data conditional pairwise distribution as a gaussian, with one variance per data point, p(x i|x j) p(x_i | x_j)
  • in the mapped data, this pairwise distribution is modeled as a fixed-variance gaussian, too, q(y i|y j) q(y_i | y_j)
  • Goal is to minimize the Kullback-Leibler divergence Σ iKL(p i||q i) \Sigma_i KL(p_i || q_i) (summed over all data points)
  • Per-data point variance is found via binary search to match a user-specified perplexity. This amounts to setting a number of nearest neighbors, somewhere between 5 and 50 work ok.
  • Cost function is minimized via gradient descent, starting with a random distribution of points yi, with plenty of momentum to speed up convergence, and noise to effect simulated annealing.
  • Cost function is remarkably simple to reduce, gradient update: δCδy i=2Σ j(p j|iq ji+p i|jq i|j)(y iy j) \frac{\delta C}{\delta y_i} = 2 \Sigma_j(p_{j|i} - q_{j-i} + p_{i|j} - q_{i|j})(y_i - y_j)
  • t-SNE differs from SNE (above) in that it addresses difficulty in optimizing the cost function, and crowding.
    • Uses a simplified symmetric cost function (symmetric conditional probability, rather than joint probability) with simpler gradients
    • Uses the student’s t-distribution in the low-dimensional map q to reduce crowding problem.
  • The crowding problem is roughly resultant from the fact that, in high-dimensional spaces, the volume of the local neighborhood scales as r m r^m , whereas in 2D, it’s just r 2 r^2 . Hence there is cost-incentive to pushing all the points together in the map -- points are volumetrically closer together in high dimensions than they can be in 2D.
    • This can be alleviated by using a one-DOF student distribution, which is the same as a Cauchy distribution, to model the probabilities in map space.
  • Smart -- they plot the topology of the gradients to gain insight into modeling / convergence behavior.
  • Don’t need simulated annealing due to balanced attractive and repulsive effects (see figure).
  • Enhance the algorithm further by keeping it compact at the beginning, so that clusters can move through each other.
  • Look up: d-bits parity task by Bengio 2007

hide / / print
ref: -2005 tags: dimensionality reduction contrastive gradient descent date: 09-13-2020 02:49 gmt revision:2 [1] [0] [head]

Dimensionality reduction by learning and invariant mapping

  • Raia Hadsell, Sumit Chopra, Yann LeCun
  • Central idea: learn and invariant mapping of the input by minimizing mapped distance (e.g. the distance between outputs) when the samples are categorized as the same (same numbers in MNIST eg), and maximizing mapped distance when the samples are categorized as distant.
    • Two loss functions for same vs different.
  • This is an attraction-repulsion spring analogy.
  • Use gradient descent to change the weights to satisfy these two competing losses.
  • Resulting constitutional neural nets can extract camera pose information from the NORB dataset.
  • Surprising how simple analogies like this, when iterated across a great many samples, pull out intuitively correct invariances.

hide / / print
ref: BarGad-2003.12 tags: information dimensionality reduction reinforcement learning basal_ganglia RDDR SNR globus pallidus date: 01-16-2012 19:18 gmt revision:3 [2] [1] [0] [head]

PMID-15013228[] Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia (2003)

  • long paper! looks like they used latex.
  • they focus on a 'new model' for the basal ganglia: reinforcement driven dimensionality reduction (RDDR)
  • in order to make sense of the system - according to them - any model must ingore huge ammounts of information about the studied areas.
  • ventral striatum = nucelus accumbens!
  • striatum is broken into two, rough, parts: ventral and dorsal
    • dorsal striatum: the caudate and putamen are a part of the
    • ventral striatum: the nucelus accumbens, medial and ventral portions of the caudate and putamen, and striatal cells of the olifactory tubercle (!) and anterior perforated substance.
  • ~90 of neurons in the striatum are medium spiny neurons
    • dendrites fill 0.5mm^3
    • cells have up and down states.
      • the states are controlled by intrinsic connections
      • project to GPe GPi & SNr (primarily), using GABA.
  • 1-2% of neurons in the striatum are tonically active neurons (TANs)
    • use acetylcholine (among others)
    • fewer spines
    • more sensitive to input
    • TANs encode information relevant to reinforcement or incentive behavior