Discovering hidden factors of variation in deep networks
 Well, they are not really that deep ...
 Use a VAE to encode both a supervised signal (class labels) as well as unsupervised latents.
 Penalize a combination of the MSE of reconstruction, logits of the classification error, and a special crosscovariance term to decorrelate the supervised and unsupervised latent vectors.

 Crosscovariance penalty:

 Tested on
 MNIST  discovered style / rotation of the characters
 Toronto faces database  seven expressions, many individuals; extracted eigenemotions sorta.
 MultiPIE many faces, many viewpoints ; was able to vary camera pose and illumination with the unsupervised latents.

This was compiled from searching papers which referenced Olhausen and Field 1996 PMID8637596 Emergence of simplecell receptive field properties by learning a sparse code for natural images.
 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations  unsupervised, convolutional. Andrew Ng. 2009
 Building highlevel features using large scale unsupervised learning  2011, Andrew Ng, Quoc Le, Jeff Dean, Google.
 unsupervised, convolutional
 Required a lot of compute: 16k cores for 3 days.
 Unsupervised feature learning for audio classification using convolutional deep belief networks Andrew Ng et al 2009.
 convolutional deep net for audio recognition.
 Robust Object Recognition with CortexLike Mechanisms Poggio MIT 2007  again alternate template matching and maximum pooling. Hype it's applicability to many domains. Not sure if this is supervised or not.
 Just relax: convex programming methods for identifying sparse signals in noise Joel Tropp 2006  extraction of linear combination of elementary signals corrupted with gaussian noise. Proposes algorithm / class of algo for solving w convex program in polynomial time.
 Deep learning in neural networks: An overview Jurgen Schimdhuber 2014  I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
 Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion


 Performs reasonably well.
 Needs supervised finetuning, but most features are learned in an unsupervised way.
 Learning fast approximations of sparse coding Karol Gregor, Yan LeCun 2010.
 Sparse = minimize L1 norm of reconstruction.
 The main idea is to train a nonlinear, feedforward predictor with a specific architecture and a fixed depth to produce the best possible approximation of the sparse code.
 10x bette rthan the previous.
 Can be used to initialize an exact algorithm.
 Emergence of Phase and ShiftInvariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen and Patrik Hoyer 2000
 Olshausen and Field 1996 produce filters that are lcoalized in both space and frequency  Gabor filters.
 The same primciples of independence maximization can explain the emergence of phase and shift invariant features, similar to those found in complex cells.
 This new kind of emergence is obtained by maximizing the independence between norms of projections on linear subspaces (instead of the independence of simple linear filter outputs)
 Dictionaries for Sparse Representation Modeling Ron Rubinstein ; Alfred M. Bruckstein ; Michael Elad 2010
 Review of the various dictionary approaches for describing signals as combinations of dictionary entries, including MOD, KSVD, generalized PCA, etc.
 How Does the Brain Solve Visual Object Recognition? James DiCarlo, Davide Zoccolan, Nicole C Rust 2012
 Serial chain models? Andor alternations of features? Interesting.
 Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks Vogels TP1, Sprekeler H, Zenke F, Clopath C, Gerstner W 2011.
 Balanced excitation and inhibition leades to sparse firing patterns, and these firing patterns can be elicited by remembered external stimuli.
 Hebbian plus homeostatic plus STDP plasticity.
 Connectivity reflects coding: a model of voltagebased STDP with homeostasis Claudia Clopath, Lars Büsing, Eleni Vasilaki & Wulfram Gerstner 2010
 Electrophysiological connectivity patterns in cortex often have a few strong connections, which are sometimes bidirectional, among a lot of weak connections.
 STDP simulated recurrent neural network.
 Plasticity rule led not only to development of localized receptive fields but also to connectivity patterns that reflect the neural code.
 This plasticity should be fast
 Neural correlations, population coding and computation Bruno Averbeck, Peter Tatham and Alexandre Pouget 2006
 Neuronal firing is highly variable, but this variance is typically correlated across cells  why?

PMID15321069 Sparse coding of sensory inputs
 Classic review, Olshausen and Field. 15 years old now!
 Note the sparsity here is in neuronal activation, not synaptic activity (though one should follow the other).
 References Lewicki's auditory studies, Efficient coding of natural sounds 2002; properties of early auditory neurons are well suited for producing a sparse independent code.
 Studies have found near binary encoding of stimuli in rat auditory cortex  e.g. one spike per noise.
 Suggests that overcomplete representations (e.g. where there are more 'second layer' neurons than inputs or pixels) are useful for flattening manifolds in the input space, making feature extraction easier.
 But then you have an underdetermined problem, where presumably sparsity metrics step in to restrict the actual coding space. Authors mention that this could lead to degeneracy.
 Example is the early visual cortex, where axons to higher layers exceed those from the LGN by a factor of 25. Which, they say, may be a compromise between overrepresentation and degeneracy.
 Sparse coding is a necessity from an energy standpoint  only one in 50 neurons can be active at any given time.
 Sparsity increases when classical receptive field stimuli in V1 is expanded with a realworldstatistics surround. (Gallant 2002).
