“Visualizing data using tSNE”
 Laurens van der Maaten, Geoffrey Hinton.
 SNE: stochastic neighbor embedding, Hinton 2002.
 Idea: model the data conditional pairwise distribution as a gaussian, with one variance per data point, $p(x_i  x_j)$
 in the mapped data, this pairwise distribution is modeled as a fixedvariance gaussian, too, $q(y_i  y_j)$
 Goal is to minimize the KullbackLeibler divergence $\Sigma_i KL(p_i  q_i)$ (summed over all data points)
 Perdata point variance is found via binary search to match a userspecified perplexity. This amounts to setting a number of nearest neighbors, somewhere between 5 and 50 work ok.
 Cost function is minimized via gradient descent, starting with a random distribution of points yi, with plenty of momentum to speed up convergence, and noise to effect simulated annealing.
 Cost function is remarkably simple to reduce, gradient update: $\frac{\delta C}{\delta y_i} = 2 \Sigma_j(p_{ji}  q_{ji} + p_{ij}  q_{ij})(y_i  y_j)$
 tSNE differs from SNE (above) in that it addresses difficulty in optimizing the cost function, and crowding.
 Uses a simplified symmetric cost function (symmetric conditional probability, rather than joint probability) with simpler gradients
 Uses the student’s tdistribution in the lowdimensional map q to reduce crowding problem.
 The crowding problem is roughly resultant from the fact that, in highdimensional spaces, the volume of the local neighborhood scales as $r^m$ , whereas in 2D, it’s just $r^2$ . Hence there is costincentive to pushing all the points together in the map  points are volumetrically closer together in high dimensions than they can be in 2D.
 This can be alleviated by using a oneDOF student distribution, which is the same as a Cauchy distribution, to model the probabilities in map space.
 Smart  they plot the topology of the gradients to gain insight into modeling / convergence behavior.
 Don’t need simulated annealing due to balanced attractive and repulsive effects (see figure).
 Enhance the algorithm further by keeping it compact at the beginning, so that clusters can move through each other.
 Look up: dbits parity task by Bengio 2007

PMID25855189 Mapping Synapses by Conjugate LightElectron Array Tomography
 Use aligned interleaved immunofluorescence imaging follwed by array EM (FESEM). 70nm thick sections.
 Of IHC, tissue must be dehydrated & embedded in a resin.
 However, the dehydration disrupts cell membranes and ultrastructural details viewed via EM ...
 Hence, EM microscopy uses osmium tetroxide to crosslink the lipids.
 ... Yet that also disrupt / refolds the poteins, making IHC fail.
 Solution is to dehydrate & embed at cryo temp, 70C, where the lipids do not dissolve. They used Lowicryl HM20.
 We show that cryoembedding provides markedly improved ultrastructure while still permitting multiplexed immunohistochemistry.

