 m8ta
use https for features.
 text: sort by tags: modified type: chronology
 {1517} hide / / print ref: -2015 tags: spiking neural networks causality inference demixing date: 07-22-2020 18:13 gmt revision:1  [head] Rubén Moreno-Bote & Jan Drugowitsch Use linear non-negative mixing plus nose to generate a series of sensory stimuli. Pass these through a one-layer spiking or non-spiking neural network with adaptive global inhibition and adaptive reset voltage to solve this quadratic programming problem with non-negative constraints. N causes, one observation: $\mu = \Sigma_{i=1}^{N} u_i r_i + \epsilon$ , $r_i \geq 0$ -- causes can be present or not present, but not negative. cause coefficients drawn from a truncated (positive only) Gaussian. linear spiking network with symmetric weight matrix $J = -U^TU - \beta I$ (see figure above) That is ... J looks like a correlation matrix! $U$ is M x N; columns are the mixing vectors. U is known beforehand and not learned That said, as a quasi-correlation matrix, it might not be so hard to learn. See ref . Can solve this problem by minimizing the negative log-posterior function: $$L(\mu, r) = \frac{1}{2}(\mu - Ur)^T(\mu - Ur) + \alpha1^Tr + \frac{\beta}{2}r^Tr$$ That is, want to maximize the joint probability of the data and observations given the probabilistic model $p(\mu, r) \propto exp(-L(\mu, r)) \Pi_{i=1}^{N} H(r_i)$ First term quadratically penalizes difference between prediction and measurement. second term, alpha is a L1 regularization term, and third term w beta is a L2 regularization. The negative log-likelihood is then converted to an energy function (linear algebra): $W = -U^T U$ , $h = U^T \mu$ then $E(r) = 0.5 r^T W r - r^T h + \alpha 1^T r + 0.5 \beta r^T r$ This is where they get the weight matrix J or W. If the vectors U are linearly independent, then it is negative semidefinite. The dynamics of individual neurons w/ global inhibition and variable reset voltage serves to minimize this energy -- hence, solve the problem. (They gloss over this derivation in the main text). Next, show that a spike-based network can similarly 'relax' or descent the objective gradient to arrive at the quadratic programming solution. Network is N leaky integrate and fire neurons, with variable synaptic integration kernels. $\alpha$ translates then to global inhibition, and $\beta$ to lowered reset voltage. Yes, it can solve the problem .. and do so in the presence of firing noise in a finite period of time .. but a little bit meh, because the problem is not that hard, and there is no learning in the network. {1446} hide / / print ref: -2017 tags: vicarious dileep george captcha message passing inference heuristic network date: 03-06-2019 04:31 gmt revision:2   [head] Vicarious supplementary materials on their RCN (recursive cortical network). Factor scene into shape and appearance, which CNN or DCNN do not do -- they conflate (ish? what about the style networks?) They call this the coloring book approach -- extract shape then attach appearance. Hierarchy of feature layers $F_{f r c}$ (binary) and pooling layer $H_{f r c}$ (multinomial), where f is feature, r is row, c is column (e.g. over image space). Each layer is exclusively conditional on the layer above it, and all features in a layer are conditionally independent given the layer above. Pool variables $H_{f r c}$ is multinomial, and each value associated with a feature, plus one off feature. These features form a ‘pool’, which can/does have translation invariance. If any of the pool variables are set to enable $F$ , then that feature is set (or-operation). Many pools can contain a given feature. One can think of members of a pool as different alternatives of similar features. Pools can be connected laterally, so each is dependent on the activity of its neighbors. This can be used to enforce edge continuity. Each bottom-level feature corresponds to an edge, which defines ‘in’ and ‘out’ to define shape, $Y$ . These variables $Y$ are also interconnected, and form a conditional random field, a ‘Potts model’. $Y$ is generated by gibbs sampling given the F-H hierarchy above it. Below Y, the per-pixel model X specifies texture with some conditional radial dependence. The model amounts to a probabalistic model for which exact inference is impossible -- hence you must do approximate, where a bottom up pass estimates the category (with lateral connections turned off), and a top down estimates the object mask. Multiple passes can be done for multiple objects. Model has a hard time moving from rgb pixels to edge ‘in’ and ‘out’; they use edge detection pre-processing stage, e.g. Gabor filter. Training follows a very intuitive, hierarchical feature building heuristic, where if some object or collection of lower level features is not present, it’s added to the feature-pool tree. This includes some winner-take-all heuristic for sparsification. Also greedily learn some sort of feature ‘’dictionary’’ from individual unlabeled images. Lateral connections are learned similarly, with a quasi-hebbian heuristic. Neuroscience inspiration: see refs 9, 98 for message-passing based Bayesian inference. Overall, a very heuristic, detail-centric, iteratively generated model and set of algorithms. You get the sense that this was really the work of Dileep George or only a few people; that it was generated by successively patching and improving the model/algo to make up for observed failures and problems. As such, it offers little long-term vision for what is possible, or how perception and cognition occurs. Instead, proof is shown that, well, engineering works, and the space of possible solutions -- including relatively simple elements like dictionaries and WTA -- is large and fecund. Unclear how this will scale to even more complex real-world problems, where one would desire a solution that does not have to have each level carefully engineered. Modern DCNN, at least, do not seem to have this property -- the structure is learned from the (alas, labeled) data. This extends to the fact that yes, their purpose-built system achieves state of the art performance on the designated CAPATCHA tasks. Check: B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015). doi:10.1126/science.aab3050 Medline {1415} hide / / print ref: -0 tags: variational free energy inference learning bayes curiosity insight Karl Friston date: 02-15-2019 02:09 gmt revision:1  [head] PMID-28777724 Active inference, curiosity and insight. Karl J. Friston, Marco Lin, Christopher D. Frith, Giovanni Pezzulo, This has been my intuition for a while; you can learn abstract rules via active probing of the environment. This paper supports such intuitions with extensive scholarship. “The basic theme of this article is that one can cast learning, inference, and decision making as processes that resolve uncertanty about the world. References Schmidhuber 1991 “A learner should choose a policy that also maximizes the learner’s predictive power. This makes the world both interesting and exploitable.” (Still and Precup 2012) “Our approach rests on the free energy principle, which asserts that any sentient creature must minimize the entropy of its sensory exchanges with the world.” Ok, that might be generalizing things too far.. Levels of uncertainty: Perceptual inference, the causes of sensory outcomes under a particular policy Uncertainty about policies or about future states of the world, outcomes, and the probabilistic contingencies that bind them. For the last element (probabilistic contingencies between the world and outcomes), they employ Bayesian model selection / Bayesian model reduction Can occur not only on the data, but exclusively on the initial model itself. “We use simulations of abstract rule learning to show that context-sensitive contingiencies, which are manifest in a high-dimensional space of latent or hidden states, can be learned with straightforward variational principles (ie. minimization of free energy). Assume that initial states and state transitions are known. Perception or inference about hidden states (i.e. state estimation) corresponds to inverting a generative model gievn a sequence of outcomes, while learning involves updating the parameters of the model. The actual task is quite simple: central fixation leads to a color cue. The cue + peripheral color determines either which way to saccade. Gestalt: Good intuitions, but I’m left with the impression that the authors overexplain and / or make the description more complicated that it need be. The actual number of parameters to to be inferred is rather small -- 3 states in 4 (?) dimensions, and these parameters are not hard to learn by minimizing the variational free energy: $F = D[Q(x)||P(x)] - E_q[ln(P(o_t|x)]$ where D is the Kullback-Leibler divergence. Mean field approximation: $Q(x)$ is fully factored (not here). many more notes {896} hide / / print ref: Friston-2002.1 tags: neuroscience philosophy feedback top-down sensory integration inference date: 10-25-2011 23:24 gmt revision:0 [head] PMID-12450490 Functional integration and inference in the brain Extra-classical tuning: tuning is dependent on behavioral context (motor) or stimulus context (sensory). Author proposes that neuroimaging can be used to investigate it in humans. "Information theory can, in principle, proceed using only forward connections. However, it turns out that this is only possible when processes generating sensory inputs are invertible and independent. Invertibility is precluded when the cause of a percept and the context in which it is engendered interact." -- proof? citations? Makes sense though. Argues for the rather simplistic proof of backward connections via neuroimaging.. {40} hide / / print ref: bookmark-0 tags: Bayes Baysian_networks probability probabalistic_networks Kalman ICA PCA HMM Dynamic_programming inference learning date: 0-0-2006 0:0 revision:0 [head] http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html very, very good! many references, well explained too.