Building Highlevel Features Using Large Scale Unsupervised Learning
 Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng
 Input data 10M random 200x200 frames from youtube. Each video contributes only one frame.
 Used local receptive fields, to reduce the communication requirements. 1000 computers, 16 cores each, 3 days.
 "Strongly influenced by" Olshausen & Field {1448}  but this is limited to a shallow architecture.
 Lee et al 2008 show that stacked RBMs can model simple functions of the cortex.
 Lee et al 2009 show that convolutonal DBN trained on faces can learn a face detector.

 Their architecture: sparse deep autoencoder with
 Local receptive fields: each feature of the autoencoder can connect to only a small region of the lower layer (e.g. nonconvolutional)
 Purely linear layer.
 More biologically plausible & allows the learning of more invariances other than translational invariances (Le et al 2010).
 No weight sharing means the network is extra large == 1 billion weights.
 Still, the human visual cortex is about a million times larger in neurons and synapses.
 L2 pooling (Hyvarinen et al 2009) which allows the learning of invariant features.
 E.g. this is the square root of the sum of the squares of its inputs. Square root nonlinearity.
 Local contrast normalization  subtractive and divisive (Jarrett et al 2009)
 Encoding weights $W_1$ and deconding weights $W_2$ are adjusted to minimize the reconstruction error, penalized by 0.1 * the sparse pooling layer activation. Latter term encourages the network to find invariances.
 $minimize(W_1, W_2)$ $\sum_{i=1}^m {({ W_2 W_1^T x^{(i)}  x^{(i)} ^2_2 + \lambda \sum_{j=1}^k{ \sqrt{\epsilon + H_j(W_1^T x^{(i)})^2}} })}$
 $H_j$ are the weights to the jth pooling element, $\lambda = 0.1$ ; m examples; k pooling units.
 This is also known as reconstruction Topographic Independent Component Analysis.
 Weights are updated through asynchronous SGD.
 Minibatch size 100.

 Note deeper autoencoders don't fare consistently better.
