This was compiled from searching papers which referenced Olshausen and Field 1996 PMID8637596 Emergence of simplecell receptive field properties by learning a sparse code for natural images.
 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations  unsupervised, convolutional. Andrew Ng. 2009
 Building highlevel features using large scale unsupervised learning  2011, Andrew Ng, Quoc Le, Jeff Dean, Google.
 unsupervised, convolutional, encoderdecoder architecture, but still trained with SGD.
 Required a lot of compute: 16k cores for 3 days. Sophisticated asynchronous & distributed SGD system.
 10M images from random YouTube videos.
 SGD minimizes weights over the sum of three perlayer objective functions (there are three layers): a term for the L2 input reconstruction loss (encoded then decoded) + a sparsity term, weighted by the pooling layer.
 Unsupervised feature learning for audio classification using convolutional deep belief networks Andrew Ng et al 2009.
 convolutional deep net for audio recognition.
 Robust Object Recognition with CortexLike Mechanisms Poggio MIT 2007  again alternate template matching and maximum pooling. Hype it's applicability to many domains. Not sure if this is supervised or not.
 Just relax: convex programming methods for identifying sparse signals in noise Joel Tropp 2006  extraction of linear combination of elementary signals corrupted with gaussian noise. Proposes algorithm / class of algo for solving w convex program in polynomial time.
 Deep learning in neural networks: An overview Jurgen Schimdhuber 2014  I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
 Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion


 Performs reasonably well.
 Needs supervised finetuning, but most features are learned in an unsupervised way.
 Learning fast approximations of sparse coding Karol Gregor, Yan LeCun 2010.
 Sparse = minimize L1 norm of reconstruction.
 The main idea is to train a nonlinear, feedforward predictor with a specific architecture and a fixed depth to produce the best possible approximation of the sparse code.
 10x bette rthan the previous.
 Can be used to initialize an exact algorithm.
 Emergence of Phase and ShiftInvariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen and Patrik Hoyer 2000
 Olshausen and Field 1996 produce filters that are lcoalized in both space and frequency  Gabor filters.
 The same primciples of independence maximization can explain the emergence of phase and shift invariant features, similar to those found in complex cells.
 This new kind of emergence is obtained by maximizing the independence between norms of projections on linear subspaces (instead of the independence of simple linear filter outputs)
 Dictionaries for Sparse Representation Modeling Ron Rubinstein ; Alfred M. Bruckstein ; Michael Elad 2010
 Review of the various dictionary approaches for describing signals as combinations of dictionary entries, including MOD, KSVD, generalized PCA, etc.
 How Does the Brain Solve Visual Object Recognition? James DiCarlo, Davide Zoccolan, Nicole C Rust 2012
 Serial chain models? Andor alternations of features? Interesting.
 Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks Vogels TP1, Sprekeler H, Zenke F, Clopath C, Gerstner W 2011.
 Balanced excitation and inhibition leades to sparse firing patterns, and these firing patterns can be elicited by remembered external stimuli.
 Hebbian plus homeostatic plus STDP plasticity.
 Connectivity reflects coding: a model of voltagebased STDP with homeostasis Claudia Clopath, Lars Büsing, Eleni Vasilaki & Wulfram Gerstner 2010
 Electrophysiological connectivity patterns in cortex often have a few strong connections, which are sometimes bidirectional, among a lot of weak connections.
 STDP simulated recurrent neural network.
 Plasticity rule led not only to development of localized receptive fields but also to connectivity patterns that reflect the neural code.
 This plasticity should be fast
 Neural correlations, population coding and computation Bruno Averbeck, Peter Tatham and Alexandre Pouget 2006
 Neuronal firing is highly variable, but this variance is typically correlated across cells  why?
