m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
[0] Loewenstein Y, Seung HS, Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.Proc Natl Acad Sci U S A 103:41, 15224-9 (2006 Oct 10)

{1545}
hide / / print
ref: -1988 tags: Linsker infomax linear neural network hebbian learning unsupervised date: 08-03-2021 06:12 gmt revision:2 [1] [0] [head]

Self-organizaton in a perceptual network

  • Ralph Linsker, 1988.
  • One of the first (verbose, slightly diffuse) investigations of the properties of linear projection neurons (e.g. dot-product; no non-linearity) to express useful tuning functions.
  • ''Useful' is here information-preserving, in the face of noise or dimensional bottlenecks (like PCA).
  • Starts with Hebbian learning functions, and shows that this + white-noise sensory input + some local topology, you can get simple and complex visual cell responses.
    • Ralph notes that neurons in primate visual cortex are tuned in utero -- prior real-world visual experience! Wow. (Who did these studies?)
    • This is a very minimalistic starting point; there isn't even structured stimuli (!)
    • Single neuron (and later, multiple neurons) are purely feed-forward; author cautions that a lack of feedback is not biologically realistic.
      • Also note that this was back in the Motorola 680x0 days ... computers were not that powerful (but certainly could handle more than 1-2 neurons!)
  • Linear algebra shows that Hebbian synapses cause a linear layer to learn the covariance function of their inputs, QQ , with no dependence on the actual layer activity.
  • When looked at in terms of an energy function, this is equivalent to gradient descent to maximize the layer-output variance.
  • He also hits on:
    • Hopfield networks,
    • PCA,
    • Oja's constrained Hebbian rule δw i<L 2(L 1L 2w i)> \delta w_i \propto &lt; L_2(L_1 - L_2 w_i) &gt; (that is, a quadratic constraint on the weight to make Σw 21\Sigma w^2 \sim 1 )
    • Optimal linear reconstruction in the presence of noise
    • Mutual information between layer input and output (I found this to be a bit hand-wavey)
      • Yet he notes critically: "but it is not true that maximum information rate and maximum activity variance coincide when the probability distribution of signals is arbitrary".
        • Indeed. The world is characterized by very non-Gaussian structured sensory stimuli.
    • Redundancy and diversity in 2-neuron coding model.
    • Role of infomax in maximizing the determinant of the weight matrix, sorta.

One may critically challenge the infomax idea: we very much need to (and do) throw away spurious or irrelevant information in our sensory streams; what upper layers 'care about' when making decisions is certainly relevant to the lower layers. This credit-assignment is neatly solved by backprop, and there are a number 'biologically plausible' means of performing it, but both this and infomax are maybe avoiding the problem. What might the upper layers really care about? Likely 'care about' is an emergent property of the interacting local learning rules and network structure. Can you search directly in these domains, within biological limits, and motivated by statistical reality, to find unsupervised-learning networks?

You'll still need a way to rank the networks, hence an objective 'care about' function. Sigh. Either way, I don't per se put a lot of weight in the infomax principle. It could be useful, but is only part of the story. Otherwise Linsker's discussion is accessible, lucid, and prescient.

Lol.

{1543}
hide / / print
ref: -2019 tags: backprop neural networks deep learning coordinate descent alternating minimization date: 07-21-2021 03:07 gmt revision:1 [0] [head]

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

  • This paper is sort-of interesting: rather than back-propagating the errors, you optimize auxiliary variables, pre-nonlinearity 'codes' in a last-to-first layer order. The optimization is done to minimize a multimodal logistic loss function; math is not done to minimize other loss functions, but presumably this is not a limit. The loss function also includes a quadratic term on the weights.
  • After the 'codes' are set, optimization can proceed in parallel on the weights. This is done with either straight SGD or adaptive ADAM.
  • Weight L2 penalty is scheduled over time.

This is interesting in that the weight updates can be cone in parallel - perhaps more efficient - but you are still propagating errors backward, albeit via optimizing 'codes'. Given the vast infractructure devoted to auto-diff + backprop, I can't see this being adopted broadly.

That said, the idea of alternating minimization (which is used eg for EM clustering) is powerful, and this paper does describe (though I didn't read it) how there are guarantees on the convexity of the alternating minimization. Likewise, the authors show how to improve the performance of the online / minibatch algorithm by keeping around memory variables, in the form of covariance matrices.

{1522}
hide / / print
ref: -2017 tags: schema networks reinforcement learning atari breakout vicarious date: 09-29-2020 02:32 gmt revision:2 [1] [0] [head]

Schema networks: zero-shot transfer with a generative causal model of intuitive physics

  • Like a lot of papers, the title has more flash than the actual results.
  • Results which would be state of the art (as of 2017) in playing Atari breakout, then transferring performance to modifications of the game (paddle moved up a bit, wall added in the middle of the bricks, brick respawning, juggling).
  • Schema network is based on 'entities' (objects) which have binary 'attributes'. These attributes can include continuous-valued signals, in which case each binary variable is like a place fields (i think).
    • This is clever an interesting -- rather than just low-level features pointing to high-level features, this means that high-level entities can have records of low-level features -- an arrow pointing in the opposite direction, one which can (also) be learned.
    • The same idea is present in other Vicarious work, including the CAPTCHA paper and more-recent (and less good) Bio-RNN paper.
  • Entities and attributes are propagated forward in time based on 'ungrounded schemas' -- basically free-floating transition matrices. The grounded schemas are entities and action groups that have evidence in observation.
    • There doesn't seem to be much math describing exactly how this works; only exposition. Or maybe it's all hand-waving over the actual, much simpler math.
      • Get the impression that the authors are reaching to a level of formalism when in fact they just made something that works for the breakout task... I infer Dileep prefers the empirical for the formal, so this is likely primarily the first author.
  • There are no perceptual modules here -- game state is fed to the network directly as entities and attributes (and, to be fair, to the A3C model).
  • Entity-attributes vectors are concatenated into a column vector length NTNT , where NN are the number of entities, and TT are time slices.
    • For each entity of N over time T, a row-vector is made of length MRMR , where MM are the number of attributes (fixed per task) and R1R-1 are the number of neighbors in a fixed radius. That is, each entity is related to its neighbors attributes over time.
    • This is a (large, sparse) binary matrix, XX .
  • yy is the vector of actions; task is to predict actions from XX .
    • How is X learned?? Very unclear in the paper vs. figure 2.
  • The solution is approximated as y=XW1¯y = X W \bar{1 } where WW is a binary weight matrix.
    • Minimize the solution based on an objective function on the error and the complexity of ww .
    • This is found via linear programming relaxation. "This procedure monotonically decreases the prediction error of the overall schema network, while increasing its complexity".
      • As it's a issue of binary conjunctions, this seems like a SAT problem!
    • Note that it's not probabilistic: "For this algorithm to work, no contradictions can exist in the input data" -- they instead remove them!
  • Actual behavior includes maximum-product belief propagation, to look for series of transitions that set the reward variable without setting the fail variable.
    • Because the network is loopy, this has to occur several times to set entity variables eg & includes backtracking.

  • Have there been any further papers exploring schema networks? What happened to this?
  • The later paper from Vicarious on zero-shot task transfer are rather less interesting (to me) than this.

{1517}
hide / / print
ref: -2015 tags: spiking neural networks causality inference demixing date: 07-22-2020 18:13 gmt revision:1 [0] [head]

PMID-26621426 Causal Inference and Explaining Away in a Spiking Network

  • Rubén Moreno-Bote & Jan Drugowitsch
  • Use linear non-negative mixing plus nose to generate a series of sensory stimuli.
  • Pass these through a one-layer spiking or non-spiking neural network with adaptive global inhibition and adaptive reset voltage to solve this quadratic programming problem with non-negative constraints.
  • N causes, one observation: μ=Σ i=1 Nu ir i+ε \mu = \Sigma_{i=1}^{N} u_i r_i + \epsilon ,
    • r i0r_i \geq 0 -- causes can be present or not present, but not negative.
    • cause coefficients drawn from a truncated (positive only) Gaussian.
  • linear spiking network with symmetric weight matrix J=U TUβI J = -U^TU - \beta I (see figure above)
    • That is ... J looks like a correlation matrix!
    • UU is M x N; columns are the mixing vectors.
    • U is known beforehand and not learned
      • That said, as a quasi-correlation matrix, it might not be so hard to learn. See ref [44].
  • Can solve this problem by minimizing the negative log-posterior function: $$ L(\mu, r) = \frac{1}{2}(\mu - Ur)^T(\mu - Ur) + \alpha1^Tr + \frac{\beta}{2}r^Tr $$
    • That is, want to maximize the joint probability of the data and observations given the probabilistic model p(μ,r)exp(L(μ,r))Π i=1 NH(r i) p(\mu, r) \propto exp(-L(\mu, r)) \Pi_{i=1}^{N} H(r_i)
    • First term quadratically penalizes difference between prediction and measurement.
    • second term, alpha is a L1 regularization term, and third term w beta is a L2 regularization.
  • The negative log-likelihood is then converted to an energy function (linear algebra): W=U TUW = -U^T U , h=U Tμ h = U^T \mu then E(r)=0.5r TWrr Th+α1 Tr+0.5βr TrE(r) = 0.5 r^T W r - r^T h + \alpha 1^T r + 0.5 \beta r^T r
    • This is where they get the weight matrix J or W. If the vectors U are linearly independent, then it is negative semidefinite.
  • The dynamics of individual neurons w/ global inhibition and variable reset voltage serves to minimize this energy -- hence, solve the problem. (They gloss over this derivation in the main text).
  • Next, show that a spike-based network can similarly 'relax' or descent the objective gradient to arrive at the quadratic programming solution.
    • Network is N leaky integrate and fire neurons, with variable synaptic integration kernels.
    • α\alpha translates then to global inhibition, and β\beta to lowered reset voltage.
  • Yes, it can solve the problem .. and do so in the presence of firing noise in a finite period of time .. but a little bit meh, because the problem is not that hard, and there is no learning in the network.

{1516}
hide / / print
ref: -2017 tags: GraphSAGE graph neural network GNN date: 07-16-2020 15:49 gmt revision:2 [1] [0] [head]

Inductive representation learning on large graphs

  • William L. Hamilton, Rex Ying, Jure Leskovec
  • Problem: given a graph where each node has a set of (possibly varied) attributes, create a 'embedding' vector at each node that describes both the node and the network that surrounds it.
  • To this point (2017) there were two ways of doing this -- through matrix factorization methods, and through graph convolutional networks.
    • The matrix factorization methods or spectral methods (similar to multi-dimensional scaling, where points are projected onto a plane to preserve a distance metric) are transductive : they work entirely within-data, and don't directly generalize to new data.
      • This is parsimonious in some sense, but doesn't work well in the real world, where datasets are constantly changing and frequently growing.
  • Their approach is similar to graph convolutional networks, where (I think) the convolution is indexed by node distances.
  • General idea: each node starts out with an embedding vector = its attribute or feature vector.
  • Then, all neighboring nodes are aggregated by sampling a fixed number of the nearest neighbors (fixed for computational reasons).
    • Aggregation can be mean aggregation, LSTM aggregation (on random permuations of the neighbor nodes), or MLP -> nonlinearity -> max-pooling. Pooling has the most wins, though all seem to work...
  • The aggregated vector is concatenated with the current node feature vector, and this is fed through a learned weighting matrix and nonlinearity to output the feature vector for the current pass.
  • Passes proceed from out-in... i think.
  • Algorithm is inspired by the Weisfeiler-Lehman Isomorphism Test, which updates neighbor counts per node to estimate if graphs are isomorphic. They do a similar thing here, only with vectors not scalars, and similarly take into account the local graph structure.
    • All the aggregator functions, and for course the nonlinearities and weighting matricies, are differentiable -- so the structure is trained in a supervised way with SGD.

This is a well-put together paper, with some proofs of convergence etc -- but it still feels only lightly tested. As with many of these papers, could benefit from a positive control, where the generating function is known & you can see how well the algorithm discovers it.

Otherwise, the structure / algorithm feels rather intuitive; surprising to me that it was not developed before the matrix factorization methods.

Worth comparing this to word2vec embeddings, where local words are used to predict the current word & the resulting vector in the neck-down of the NN is the representation.

{1511}
hide / / print
ref: -2020 tags: evolution neutral drift networks random walk entropy population date: 04-08-2020 00:48 gmt revision:0 [head]

Localization of neutral evolution: selection for mutational robustness and the maximal entropy random walk

  • The take-away of the paper is that, with larger populations, random mutation and recombination make areas of the graph that take several steps to get to (in the figure, this is Maynard Smith's four-letter mutation word game) are less likely to be visited with a larger population.
  • This is because the recombination serves to make the population adhere more closely to the 'giant' mode. In Maynard's game, this is 2268 words of 2405 meaningful words that can be reached by successive letter changes.
  • The author extends it to van Nimwegen's 1999 paper / RNA genotype-secondary structure. It's not as bad as Maynard's game, but still has much lower graph-theoretic entropy than the actual population.
    • He suggests if the entropic size of the giant component is much smaller than it's dictionary size, then populations are likely to be trapped there.

  • Interesting, but I'd prefer to have an expert peer-review it first :)

{1507}
hide / / print
ref: -2015 tags: winner take all sparsity artificial neural networks date: 03-28-2020 01:15 gmt revision:0 [head]

Winner-take-all Autoencoders

  • During training of fully connected layers, they enforce a winner-take all lifetime sparsity constraint.
    • That is: when training using mini-batches, they keep the k percent largest activation of a given hidden unit across all samples presented in the mini-batch. The remainder of the activations are set to zero. The units are not competing with each other; they are competing with themselves.
    • The rest of the network is a stack of ReLU layers (upon which the sparsity constraint is applied) followed by a linear decoding layer (which makes interpretation simple).
    • They stack them via sequential training: train one layer from the output of another & not backprop the errors.
  • Works, with lower sparsity targets, also for RBMs.
  • Extended the result to WTA covnets -- here enforce both spatial and temporal (mini-batch) sparsity.
    • Spatial sparsity involves selecting the single largest hidden unit activity within each feature map. The other activities and derivatives are set to zero.
    • At test time, this sparsity constraint is released, and instead they use a 4 x 4 max-pooling layer & use that for classification or deconvolution.
  • To apply both spatial and temporal sparsity, select the highest spatial response (e.g. one unit in a 2d plane of convolutions; all have the same weights) for each feature map. Do this for every image in a mini-batch, and then apply the temporal sparsity: each feature map gets to be active exactly once, and in that time only one hidden unit (or really, one location of the input and common weights (depending on stride)) undergoes SGD.
    • Seems like it might train very slowly. Authors didn't note how many epochs were required.
  • This, too can be stacked.
  • To train on larger image sets, they first extract 48 x 48 patches & again stack...
  • Test on MNIST, SVHN, CIFAR-10 -- works ok, and well even with few labeled examples (which is consistent with their goals)

{1492}
hide / / print
ref: -2016 tags: spiking neural network self supervised learning date: 12-10-2019 03:41 gmt revision:2 [1] [0] [head]

PMID: Spiking neurons can discover predictive features by aggregate-label learning

  • This is a meandering, somewhat long-winded, and complicated paper, even for the journal Science. It's not been cited a great many times, but none-the-less is of interest.
  • The goal of the derived network is to detect fixed-pattern presynaptic sequences, and fire a prespecified number of spikes to each occurrence.
  • One key innovation is the use of a spike-threshold-surface for a 'tempotron' [12], the derivative of which is used to update the weights of synapses after trials. As the author says, spikes are hard to differentiate; the STS makes this more possible. This is hence standard gradient descent: if the neuron missed a spike then the weight is increased based on aggregate STS (for the whole trial -- hence the neuron / SGD has to perform temporal and spatial credit assignment).
    • As common, the SGD is appended with a momentum term.
  • Since STS differentiation is biologically implausible -- where would the memory lie? -- he also implements a correlational synaptic eligibility trace. The correlation is between the postsynaptic voltage and the EPSC, which seems kinda circular.
    • Unsurprisingly, it does not work as well as the SGD approximation. But does work...
  • Second innovation is the incorporation of self-supervised learning: a 'supervisory' neuron integrates the activity of a number (50) of feature detector neurons, and reinforces them to basically all fire at the same event, WTA style. This effects a unsupervised feature detection.
  • This system can be used with sort-of lateral inhibition to reinforce multiple features. Not so dramatic -- continuous feature maps.

Editorializing a bit: I said this was interesting, but why? The first part of the paper is another form of SGD, albeit in a spiking neural network, where the gradient is harder compute hence is done numerically.

It's the aggregate part that is new -- pulling in repeated patterns through synaptic learning rules. Of course, to do this, the full trace of pre and post synaptic activity must be recorded (??) for estimating the STS (i think). An eligibility trace moves in the right direction as a biologically plausible approximation, but as always nothing matches the precision of SGD. Can the eligibility trace be amended with e.g. neuromodulators to push the performance near that of SGD?

The next step of adding self supervised singular and multiple features is perhaps toward the way the brain organizes itself -- small local feedback loops. These features annotate repeated occurrences of stimuli, or tile a continuous feature space.

Still, the fact that I haven't seen any follow-up work is suggestive...


Editorializing further, there is a limited quantity of work that a single human can do. In this paper, it's a great deal of work, no doubt, and the author offers some good intuitions for the design decisions. Yet still, the total complexity that even a very determined individual can amass is limited, and likely far below the structural complexity of a mammalian brain.

This implies that inference either must be distributed and compositional (the normal path of science), or the process of evaluating & constraining models must be significantly accelerated. This later option is appealing, as current progress in neuroscience seems highly technology limited -- old results become less meaningful when the next wave of measurement tools comes around, irrespective of how much work went into it. (Though: the impedtus for measuring a particular thing in biology is only discovered through these 'less meaningful' studies...).

A third option, perhaps one which many theoretical neuroscientists believe in, is that there are some broader, physics-level organizing principles to the brain. Karl Friston's free energy principle is a good example of this. Perhaps at a meta level some organizing theory can be found, or likely a set of theories; but IMHO, you'll need at least one theory per brain area, at least, just the same as each area is morphologically, cytoarchitecturaly, and topologically distinct. (There may be only a few theories of the cortex, despite all the areas, which is why so many are eager to investigate it!)

So what constitutes a theory? Well, you have to meaningfully describe what a brain region does. (Why is almost as important; how more important to the path there.) From a sensory standpoint: what information is stored? What processing gain is enacted? How does the stored information impress itself on behavior? From a motor standpoint: how are goals selected? How are the behavioral segments to attain them sequenced? Is the goal / behavior even a reasonable way of factoring the problem?

Our dual problem, building the bridge from the other direction, is perhaps easier. Or it could be a lot more money has gone into it. Either way, much progress has been made in AI. One arm is deep function approximation / database compression for fast and organized indexing, aka deep learning. Many people are thinking about that; no need to add to the pile; anyway, as OpenAI has proven, the common solution to many problems is to simply throw more compute at it. A second is deep reinforcement learning, which is hideously sample and path inefficient, hence ripe for improvement. One side is motor: rather than indexing raw motor variables (LRUD in a video game, or joint torques with a robot..) you can index motor primitives, perhaps hierarchically built; likewise, for the sensory input, the model needs to infer structure about the world. This inference should decompose overwhelming sensory experience into navigable causes ...

But how can we do this decomposition? The cortex is more than adept at it, but now we're at the original problem, one that the paper above purports to make a stab at.

{1418}
hide / / print
ref: -0 tags: nanophotonics interferometry neural network mach zehnder interferometer optics date: 06-13-2019 21:55 gmt revision:3 [2] [1] [0] [head]

Deep Learning with Coherent Nanophotonic Circuits

  • Used a series of Mach-Zehnder interferometers with thermoelectric phase-shift elements to realize the unitary component of individual layer weight-matrix computation.
    • Weight matrix was decomposed via SVD into UV*, which formed the unitary matrix (4x4, Special unitary 4 group, SU(4)), as well as Σ\Sigma diagonal matrix via amplitude modulators. See figure above / original paper.
    • Note that interfereometric matrix multiplication can (theoretically) be zero energy with an optical system (modulo loss).
      • In practice, you need to run the phase-moduator heaters.
  • Nonlinearity was implemented electronically after the photodetector (e.g. they had only one photonic circuit; to get multiple layers, fed activations repeatedly through it. This was a demonstration!)
  • Fed network FFT'd / banded recordings of consonants through the network to get near-simulated vowel recognition.
    • Claim that noise was from imperfect phase setting in the MZI + lower resolution photodiode read-out.
  • They note that the network can more easily (??) be trained via the finite difference algorithm (e.g. test out an incremental change per weight / parameter) since running the network forward is so (relatively) low-energy and fast.
    • Well, that's not totally true -- you need to update multiple weights at once in a large / deep network to descend any high-dimensional valleys.

{1463}
hide / / print
ref: -2019 tags: optical neural networks spiking phase change material learning date: 06-01-2019 19:00 gmt revision:4 [3] [2] [1] [0] [head]

All-optical spiking neurosynaptic networks with self-learning capabilities

  • J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran & W. H. P. Pernice
  • Idea: use phase-change material to either block or pass the light in waveguides.
    • In this case, they used GST -- germanium-antimony-tellurium. This material is less reflective in the amorphous phase, which can be reached by heating to ~150C and rapidly quenching. It is more reflective in the crystalline phase, which occurs on annealing.
  • This is used for both plastic synapses (phase change driven by the intensity of the light) and the nonlinear output of optical neurons (via a ring resonator).
  • Uses optical resonators with very high Q factors to couple different wavelengths of light into the 'dendrite'.
  • Ring resonator on the output: to match the polarity of the phase-change material. Is this for reset? Storing light until trigger?
  • Were able to get correlative-like or hebbian learning (which I suppose is not dissimilar from really slow photographic film, just re-branded, and most importantly with nonlinear feedback.)
  • Issue: every weight needs a different source wavelength! Hence they have not demonstrated a multi-layer network.
  • Previous paper: All-optical nonlinear activation function for photonic neural networks
    • Only 3db and 7db extinction ratios for induced transparency and inverse saturation

{1446}
hide / / print
ref: -2017 tags: vicarious dileep george captcha message passing inference heuristic network date: 03-06-2019 04:31 gmt revision:2 [1] [0] [head]

PMID-29074582 A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

  • Vicarious supplementary materials on their RCN (recursive cortical network).
  • Factor scene into shape and appearance, which CNN or DCNN do not do -- they conflate (ish? what about the style networks?)
    • They call this the coloring book approach -- extract shape then attach appearance.
  • Hierarchy of feature layers F frcF_{f r c} (binary) and pooling layer H frcH_{f r c} (multinomial), where f is feature, r is row, c is column (e.g. over image space).
  • Each layer is exclusively conditional on the layer above it, and all features in a layer are conditionally independent given the layer above.
  • Pool variables H frcH_{f r c} is multinomial, and each value associated with a feature, plus one off feature.
    • These features form a ‘pool’, which can/does have translation invariance.
  • If any of the pool variables are set to enable FF , then that feature is set (or-operation). Many pools can contain a given feature.
  • One can think of members of a pool as different alternatives of similar features.
  • Pools can be connected laterally, so each is dependent on the activity of its neighbors. This can be used to enforce edge continuity.
  • Each bottom-level feature corresponds to an edge, which defines ‘in’ and ‘out’ to define shape, YY .
  • These variables YY are also interconnected, and form a conditional random field, a ‘Potts model’. YY is generated by gibbs sampling given the F-H hierarchy above it.
  • Below Y, the per-pixel model X specifies texture with some conditional radial dependence.
  • The model amounts to a probabalistic model for which exact inference is impossible -- hence you must do approximate, where a bottom up pass estimates the category (with lateral connections turned off), and a top down estimates the object mask. Multiple passes can be done for multiple objects.
  • Model has a hard time moving from rgb pixels to edge ‘in’ and ‘out’; they use edge detection pre-processing stage, e.g. Gabor filter.
  • Training follows a very intuitive, hierarchical feature building heuristic, where if some object or collection of lower level features is not present, it’s added to the feature-pool tree.
    • This includes some winner-take-all heuristic for sparsification.
    • Also greedily learn some sort of feature ‘’dictionary’’ from individual unlabeled images.
  • Lateral connections are learned similarly, with a quasi-hebbian heuristic.
  • Neuroscience inspiration: see refs 9, 98 for message-passing based Bayesian inference.

  • Overall, a very heuristic, detail-centric, iteratively generated model and set of algorithms. You get the sense that this was really the work of Dileep George or only a few people; that it was generated by successively patching and improving the model/algo to make up for observed failures and problems.
    • As such, it offers little long-term vision for what is possible, or how perception and cognition occurs.
    • Instead, proof is shown that, well, engineering works, and the space of possible solutions -- including relatively simple elements like dictionaries and WTA -- is large and fecund.
      • Unclear how this will scale to even more complex real-world problems, where one would desire a solution that does not have to have each level carefully engineered.
      • Modern DCNN, at least, do not seem to have this property -- the structure is learned from the (alas, labeled) data.
  • This extends to the fact that yes, their purpose-built system achieves state of the art performance on the designated CAPATCHA tasks.
  • Check: B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015). doi:10.1126/science.aab3050 Medline

{1434}
hide / / print
ref: -0 tags: convolutional neural networks audio feature extraction vocals keras tensor flow fourier date: 02-18-2019 21:40 gmt revision:3 [2] [1] [0] [head]

Audio AI: isolating vocals from stereo music using Convolutional Neural Networks

  • Ale Koretzky
  • Fairly standard CNN, but use a binary STFT mask to isolate vocals from instruments.
    • Get Fourier-type time-domain artifacts as a results; but it sounds reasonable.
    • Didn't realize it until this paper / blog post: stacked conv layers combine channels.
    • E.g. Input size 513*25*16 513 * 25 * 16 (512 freq channels + DC, 25 time slices, 16 filter channels) into a 3x3 Conv2D -> 3*3*16+16=1603 * 3 * 16 + 16 = 160 total parameters (filter weights and bias).
    • If this is followed by a second Conv2D layer of the same parameters, the layer acts as a 'normal' fully connected network in the channel dimension.
    • This means there are (3*3*16)*16+16=2320(3 * 3 * 16) * 16 + 16 = 2320 parameters.
      • Each input channel from the previous conv layer has independent weights -- they are not shared -- whereas the spatial weights are shared.
      • Hence, same number of input channels and output channels (in this case; doesn't have to be).
      • This, naturally, falls out of spatial weight sharing, which might be obvious in retrospect; of course it doesn't make sense to share non-spatial weights.
      • See also: https://datascience.stackexchange.com/questions/17064/number-of-parameters-for-convolution-layers
  • Synthesized a large training set via acapella youtube videos plus instrument tabs .. that looked like a lot of work!
    • Need a karaoke database here.
  • Authors wrapped this into a realtime extraction toolkit.

{1426}
hide / / print
ref: -2019 tags: Arild Nokland local error signals backprop neural networks mnist cifar VGG date: 02-15-2019 03:15 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

Training neural networks with local error signals

  • Arild Nokland and Lars H Eidnes
  • Idea is to use one+ supplementary neural networks to measure within-batch matching loss between transformed hidden-layer output and one-hot label data to produce layer-local learning signals (gradients) for improving local representation.
  • Hence, no backprop. Error signals are all local, and inter-layer dependencies are not explicitly accounted for (! I think).
  • L simL_{sim} : given a mini-batch of hidden layer activations H=(h 1,...,h n)H = (h_1, ..., h_n) and a one-hot encoded label matrix Y=(y 1,...,y nY = (y_1, ..., y_n ,
    • L sim=||S(NeuralNet(H))S(Y)|| F 2 L_{sim} = || S(NeuralNet(H)) - S(Y)||^2_F (don't know what F is..)
    • NeuralNet()NeuralNet() is a convolutional neural net (trained how?) 3*3, stride 1, reduces output to 2.
    • S()S() is the cosine similarity matrix, or correlation matrix, of a mini-batch.
  • L pred=CrossEntropy(Y,W TH)L_{pred} = CrossEntropy(Y, W^T H) where W is a weight matrix, dim hidden_size * n_classes.
    • Cross-entropy is H(Y,W TH)=Σ i,jY i,jlog((W TH) i,j)+(1Y i,j)log(1(W TH) i,j) H(Y, W^T H) = \Sigma_{i,j} Y_{i,j} log((W^T H)_{i,j}) + (1-Y_{i,j}) log(1-(W^T H)_{i,j})
  • Sim-bio loss: replace NeuralNet()NeuralNet() with average-pooling and standard-deviation op. Plus one-hot target is replaced with a random transformation of the same target vector.
  • Overall loss 99% L simL_sim , 1% L predL_pred
    • Despite the unequal weighting, both seem to improve test prediction on all examples.
  • VGG like network, with dropout and cutout (blacking out square regions of input space), batch size 128.
  • Tested on all the relevant datasets: MNIST, Fashion-MNIST, Kuzushiji-MNIST, CIFAR-10, CIFAR-100, STL-10, SVHN.
  • Pretty decent review of similarity matching measures at the beginning of the paper; not extensive but puts everything in context.
    • See for example non-negative matrix factorization using Hebbian and anti-Hebbian learning in and Chklovskii 2014.
  • Emphasis put on biologically realistic learning, including the use of feedback alignment {1423}
    • Yet: this was entirely supervised learning, as the labels were propagated back to each layer.
    • More likely that biology is setup to maximize available labels (not a new concept).

{1419}
hide / / print
ref: -0 tags: diffraction terahertz 3d print ucla deep learning optical neural networks date: 02-13-2019 23:16 gmt revision:1 [0] [head]

All-optical machine learning using diffractive deep neural networks

  • Pretty clever: use 3D printed plastic as diffractive media in a 0.4 THz all-optical all-interference (some attenuation) linear convolutional multi-layer 'neural network'.
  • In the arxive publication there are few details on how they calculated or optimized given diffractive layers.
  • Absence of nonlinearity will limit things greatly.
  • Actual observed performance (where thy had to print out the handwritten digits) rather poor, ~ 60%.

{1174}
hide / / print
ref: -0 tags: Hinton google tech talk dropout deep neural networks Boltzmann date: 02-12-2019 08:03 gmt revision:2 [1] [0] [head]

Brains, sex, and machine learning -- Hinton google tech talk.

  • Hinton believes in the the power of crowds -- he thinks that the brain fits many, many different models to the data, then selects afterward.
    • Random forests, as used in predator, is an example of this: they average many simple to fit and simple to run decision trees. (is apparently what Kinect does)
  • Talk focuses on dropout, a clever new form of model averaging where only half of the units in the hidden layers are trained for a given example.
    • He is inspired by biological evolution, where sexual reproduction often spontaneously adds or removes genes, hence individual genes or small linked genes must be self-sufficient. This equates to a 'rugged individualism' of units.
    • Likewise, dropout forces neurons to be robust to the loss of co-workers.
    • This is also great for parallelization: each unit or sub-network can be trained independently, on it's own core, with little need for communication! Later, the units can be combined via genetic algorithms then re-trained.
  • Hinton then observes that sending a real value p (output of logistic function) with probability 0.5 is the same as sending 0.5 with probability p. Hence, it makes sense to try pure binary neurons, like biological neurons in the brain.
    • Indeed, if you replace the backpropagation with single bit propagation, the resulting neural network is trained more slowly and needs to be bigger, but it generalizes better.
    • Neurons (allegedly) do something very similar to this by poisson spiking. Hinton claims this is the right thing to do (rather than sending real numbers via precise spike timing) if you want to robustly fit models to data.
      • Sending stochastic spikes is a very good way to average over the large number of models fit to incoming data.
      • Yes but this really explains little in neuroscience...
  • Paper referred to in intro: Livnat, Papadimitriou and Feldman, PMID-19073912 and later by the same authors PMID-20080594
    • A mixability theory for the role of sex in evolution. -- "We define a measure that represents the ability of alleles to perform well across different combinations and, using numerical iterations within a classical population-genetic framework, show that selection in the presence of sex favors this ability in a highly robust manner"
    • Plus David MacKay's concise illustration of why you need sex, pg 269, __Information theory, inference, and learning algorithms__
      • With rather simple assumptions, asexual reproduction yields 1 bit per generation,
      • Whereas sexual reproduction yields G\sqrt G , where G is the genome size.

{1391}
hide / / print
ref: -0 tags: computational biology evolution metabolic networks andreas wagner genotype phenotype network date: 06-12-2017 19:35 gmt revision:1 [0] [head]

Evolutionary Plasticity and Innovations in Complex Metabolic Reaction Networks

  • ‘’João F. Matias Rodrigues, Andreas Wagner ‘’
  • Our observations suggest that the robustness of the Escherichia coli metabolic network to mutations is typical of networks with the same phenotype.
  • We demonstrate that networks with the same phenotype form large sets that can be traversed through single mutations, and that single mutations of different genotypes with the same phenotype can yield very different novel phenotypes
  • Entirely computational study.
    • Examines what is possible given known metabolic building-blocks.
  • Methodology: collated a list of all metabolic reactions in E. Coli (726 reactions, excluding 205 transport reactions) out of 5870 possible reactions.
    • Then ran random-walk mutation experiments to see where the genotype + phenotype could move. Each point in the genotype had to be viable on either a rich (many carbon source) or minimal (glucose) growth medium.
    • Viability was determined by Flux-balance analysis (FBA).
      • In our work we use a set of biochemical precursors from E. coli 47-49 as the set of required compounds a network needs to synthesize, ‘’’by using linear programming to optimize the flux through a specific objective function’’’, in this case the reaction representing the production of biomass precursors we are able to know if a specific metabolic network is able to synthesize the precursors or not.
      • Used Coin-OR and Ilog to optimize the metabolic concentrations (I think?) per given network.
    • This included the ability to synthesize all required precursor biomolecules; see supplementary information.
    • ‘’’“Viable” is highly permissive -- non-zero biomolecule concentration using FBA and linear programming. ‘’’
    • Genomic distances = hamming distance between binary vectors, where 1 = enzyme / reaction possible; 0 = mutated off; 0 = identical genotype, 1 = completely different genotype.
  • Between pairs of viable genetic-metabolic networks, only a minority (30 - 40%) of reactions are essential,
    • Which naturally increases with increasing carbon source diversity:
    • When they go back an examine networks that can sustain life on any of (up to) 60 carbon sources, and again measure the distance from the original E. Coli genome, they find this added robustness does not significantly constrain network architecture.

Summary thoughts: This is a highly interesting study, insofar that the authors show substantial support for their hypotheses that phenotypes can be explored through random-walk non-lethal mutations of the genotype, and this is somewhat invariant to the source of carbon for known biochemical reactions. What gives me pause is the use of linear programming / optimization when setting the relative concentrations of biomolecules, and the permissive criteria for accepting these networks; real life (I would imagine) is far more constrained. Relative and absolute concentrations matter.

Still, the study does reflect some robustness. I suggest that a good control would be to ‘fuzz’ the list of available reactions based on statistical criteria, and see if the results still hold. Then, go back and make the reactions un-biological or less networked, and see if this destroys the measured degrees of robustness.

{1348}
hide / / print
ref: -0 tags: David Kleinfeld cortical vasculature laser surgery network occlusion flow date: 09-23-2016 06:35 gmt revision:1 [0] [head]

Heller Lecture - Prof. David Kleinfeld

  • Also mentions the use of LIBS + q-switched laser for precisely drilling holes in the scull. Seems to work!
    • Use 20ns delay .. seems like there is still spectral broadening.
    • "Turn neuroscience into an industrial process, not an art form" After doing many surgeries, agreed!
  • Vasodiliation & vasoconstriction is very highly regulated; there is not enough blood to go around.
    • Vessels distant from a energetic / stimulated site will (net) constrict.
  • Vascular network is most entirely closed-loop, and not tree-like at all -- you can occlude one artery, or one capillary, and the network will route around the occlusion.
    • The density of the angio-architecture in the brain is unique in this.
  • Tested micro-occlusions by injecting rose bengal, which releases free radicals on light exposure (532nm, 0.5mw), causing coagulation.
  • "Blood flow on the surface arteriole network is insensitive to single occlusions"
  • Penetrating arterioles and venules are largely stubs -- single unbranching vessels, which again renders some immunity to blockage.
  • However! Occlusion of a penetrating arteriole retards flow within a 400 - 600um cylinder (larger than a cortical column!)
  • Occulsion of many penetrating vessels, unsurprisingly, leads to large swaths of dead cortex, "UBOS" in MRI parlance (unidentified bright objects).
  • Death and depolarizing depression can be effectively prevented by excitotoxicity inhibitors -- MK801 in the slides (NMDA blocker, systemically)

{1269}
hide / / print
ref: -0 tags: hinton convolutional deep networks image recognition 2012 date: 01-11-2014 20:14 gmt revision:0 [head]

ImageNet Classification with Deep Convolutional Networks

{913}
hide / / print
ref: Ganguly-2011.05 tags: Carmena 2011 reversible cortical networks learning indirect BMI date: 01-23-2013 18:54 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

PMID-21499255[0] Reversible large-scale modification of cortical networks during neuroprosthetic control.

  • Split the group of recorded motor neurons into direct (decoded and controls the BMI) and indirect (passive) neurons.
  • Both groups showed changes in neuronal tuning / PD.
    • More PD. Is there no better metric?
  • Monkeys performed manual control before (MC1) and after (MC2) BMI training.
    • The majority of neurons reverted back to original tuning after BC; c.f. [1]
  • Monkeys were trained to rapidly switch between manual and brain control; still showed substantial changes in PD.
  • 'Near' (on same electrode as direct neurons) and 'far' neurons (different electrode) showed similar changes in PD.
    • Modulation Depth in indirect neurons was less in BC than manual control.
  • Prove (pretty well) that motor cortex neuronal spiking can be dissociated from movement.
  • Indirect neurons showed decreased modulation depth (MD) -> perhaps this is to decrease interference with direct neurons.
  • Quote "Studies of operant conditioning of single neurons found that conconditioned adjacent neurons were largely correlated with the conditioned neurons".
    • Well, also: Fetz and Baker showed that you can condition neurons recorded on the same electrode to covary or inversely vary.
  • Contrast with studies of motor learning in different force fields, where there is a dramatic memory trace.
    • Possibly this is from proprioception activating the cerebellum?

Other notes:

  • Scale bars on the waveforms are incorrect for figure 1.
  • Same monkeys as [2]

____References____

[0] Ganguly K, Dimitrov DF, Wallis JD, Carmena JM, Reversible large-scale modification of cortical networks during neuroprosthetic control.Nat Neurosci 14:5, 662-7 (2011 May)
[1] Gandolfo F, Li C, Benda BJ, Schioppa CP, Bizzi E, Cortical correlates of learning in monkeys adapting to a new dynamical environment.Proc Natl Acad Sci U S A 97:5, 2259-63 (2000 Feb 29)
[2] Ganguly K, Carmena JM, Emergence of a stable cortical map for neuroprosthetic control.PLoS Biol 7:7, e1000153 (2009 Jul)

{1007}
hide / / print
ref: Dethier-2011.28 tags: BMI decoder spiking neural network Kalman date: 01-06-2012 00:20 gmt revision:1 [0] [head]

IEEE-5910570 (pdf) Spiking neural network decoder for brain-machine interfaces

  • Golden standard: kalman filter.
  • Spiking neural network got within 1% of this standard.
  • THe 'neuromorphic' approach.
  • Used Nengo, freely available neural simulator.

____References____

Dethier, J. and Gilja, V. and Nuyujukian, P. and Elassaad, S.A. and Shenoy, K.V. and Boahen, K. Neural Engineering (NER), 2011 5th International IEEE/EMBS Conference on 396 -399 (2011)

{993}
hide / / print
ref: Sanchez-2005.06 tags: BMI Sanchez Nicolelis Wessberg recurrent neural network date: 01-01-2012 18:28 gmt revision:2 [1] [0] [head]

IEEE-1439548 (pdf) Interpreting spatial and temporal neural activity through a recurrent neural network brain-machine interface

  • Putting it here for the record.
  • Note they did a sensitivity analysis (via chain rule) of the recurrent neural network used for BMI predictions.
  • Used data (X,Y,Z) from 2 monkeys feeding.
  • Figure 6 is strange, data could be represented better.
  • Also see: IEEE-1300786 (pdf) Ascertaining the importance of neurons to develop better brain-machine interfaces Also by Justin Sanchez.

____References____

Sanchez, J.C. and Erdogmus, D. and Nicolelis, M.A.L. and Wessberg, J. and Principe, J.C. Interpreting spatial and temporal neural activity through a recurrent neural network brain-machine interface Neural Systems and Rehabilitation Engineering, IEEE Transactions on 13 2 213 -219 (2005)

{968}
hide / / print
ref: Bassett-2009.07 tags: Weinberger congnitive efficiency beta band neuroimagaing EEG task performance optimization network size effort date: 12-28-2011 20:39 gmt revision:1 [0] [head]

PMID-19564605[0] Cognitive fitness of cost-efficient brain functional networks.

  • Idea: smaller, tighter networks are correlated with better task performance
    • working memory task in normal subjects and schizophrenics.
  • Larger networks operate with higher beta frequencies (more effort?) and show less efficient task performance.
  • Not sure about the noisy data, but v. interesting theory!

____References____

[0] Bassett DS, Bullmore ET, Meyer-Lindenberg A, Apud JA, Weinberger DR, Coppola R, Cognitive fitness of cost-efficient brain functional networks.Proc Natl Acad Sci U S A 106:28, 11747-52 (2009 Jul 14)

{323}
hide / / print
ref: Loewenstein-2006.1 tags: reinforcement learning operant conditioning neural networks theory date: 12-07-2011 03:36 gmt revision:4 [3] [2] [1] [0] [head]

PMID-17008410[0] Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity

  • The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law.
  • We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neural activity and prove mathematically that matching (alternative to reward) is a generic outcome of such plasticity
    • models for learning that are based on the covariance between reward and choice are common in economics and are used phenomologically to explain human behavior.
  • this model can be tested experimentally by making reward contingent not on the choices, but rather on the activity of neural activity.
  • Maximization is shown to be a generic outcome of synaptic plasticity driven by the sum of the covariances between reward and all past neural activities.

____References____

{862}
hide / / print
ref: -0 tags: backpropagation cascade correlation neural networks date: 12-20-2010 06:28 gmt revision:1 [0] [head]

The Cascade-Correlation Learning Architecture

  • Much better - much more sensible, computationally cheaper, than backprop.
  • Units are added one by one; each is trained to be maximally correlated to the error of the existing, frozen neural network.
  • Uses quickprop to speed up gradient ascent learning.

{789}
hide / / print
ref: work-0 tags: emergent leabra QT neural networks GUI interface date: 10-21-2009 19:02 gmt revision:4 [3] [2] [1] [0] [head]

I've been reading Computational Explorations in Cognitive Neuroscience, and decided to try the code that comes with / is associated with the book. This used to be called "PDP+", but was re-written, and is now called Emergent. It's a rather large program - links to Qt, GSL, Coin3D, Quarter, Open Dynamics Library, and others. The GUI itself seems obtuse and too heavy; it's not clear why they need to make this so customized / panneled / tabbed. Also, it depends on relatively recent versions of each of these libraries - which made the install on my Debian Lenny system a bit of a chore (kinda like windows).

A really strange thing is that programs are stored in tree lists - woah - a natural folding editor built in! I've never seen a programming language that doesn't rely on simple text files. Not a bad idea, but still foreign to me. (But I guess programs are inherently hierarchal anyway.)

Below, a screenshot of the whole program - note they use a Coin3D window to graph things / interact with the model. The colored boxes in each network layer indicate local activations, and they update as the network is trained. I don't mind this interface, but again it seems a bit too 'heavy' for things that are inherently 2D (like 2D network activations and the output plot). It's good for seeing hierarchies, though, like the network model.

All in all looks like something that could be more easily accomplished with some python (or ocaml), where the language itself is used for customization, and not a GUI. With this approach, you spend more time learning about how networks work, and less time programming GUIs. On the other hand, if you use this program for teaching, the gui is essential for debugging your neural networks, or other people use it a lot, maybe then it is worth it ...

In any case, the book is very good. I've learned about GeneRec, which uses different activation phases to compute local errors for the purposes of error-minimization, as well as the virtues of using both Hebbian and error-based learning (like GeneRec). Specifically, the authors show that error-based learning can be rather 'lazy', purely moving down the error gradient, whereas Hebbian learning can internalize some of the correlational structure of the input space. You can look at this internalization as 'weight constraint' which limits the space that error-based learning has to search. Cool idea! Inhibition also is a constraint - one which constrains the network to be sparse.

To use his/their own words:

... given the explanation above about the network's poor generalization, it should be clear why both Hebbian learning and kWTA (k winner take all) inhibitory competition can improve generalization performance. At the most general level, they constitute additional biases that place important constraints on the learning and the development of representations. Mroe specifically, Hebbian learning constrains the weights to represent the correlational structure of the inputs to a given unit, producing systematic weight patterns (e.g. cleanly separated clusters of strong correlations).

Inhibitory competition helps in two ways. First, it encourages individual units to specialize in representing a subset of items, thus parcelling up the task in a much cleaner and more systematic way than would occur in an otherwise unconstrained network. Second, inhibition greatly restricts the settling dynamics of the network, greatly constraining the number of states the network can settle into, and thus eliminating a large proportion of the attractors that can hijack generalization.."

{776}
hide / / print
ref: work-0 tags: neural networks course date: 09-01-2009 04:24 gmt revision:0 [head]

http://www.willamette.edu/~gorr/classes/cs449/intro.html -- descent resource, good explanation of the equations associated with artificial neural networks.

{724}
hide / / print
ref: Oskoei-2008.08 tags: EMG pattern analysis classification neural network date: 04-07-2009 21:10 gmt revision:2 [1] [0] [head]

  • EMG pattern analysis and classification by Neural Network
    • 1989!
    • short, simple paper. showed that 20 patterns can accurately be decoded with a backprop-trained neural network.
  • PMID-18632358 Support vector machine-based classification scheme for myoelectric control applied to upper limb.
    • myoelectric discrimination with SVM running on features in both the time and frequency domain.
    • a survace MES (myoelectric sensor) is formed via the superposition of individual action potentials generated by irregular discharges of active motor units in a muscle fiber. It's amplitude, variance, energy, and frequency vary depending on contration level.
    • Time domain features:
      • Mean absolute value (MAV)
      • root mean square (RMS)
      • waveform length (WL)
      • variance
      • zero crossings (ZC)
      • slope sign changes (SSC)
      • William amplitude.
    • Frequency domain features:
      • power spectrum
      • autoregressive coefficients order 2 and 6
      • mean signal frequency
      • median signal frequency
      • good performance with just RMS + AR2 for 50 or 100ms segments. Used a SVM with a RBF kernel.
      • looks like you can just get away with time-domain metrics!!

{695}
hide / / print
ref: -0 tags: alopex machine learning artificial neural networks date: 03-09-2009 22:12 gmt revision:0 [head]

Alopex: A Correlation-Based Learning Algorithm for Feed-Forward and Recurrent Neural Networks (1994)

  • read the abstract! rather than using the gradient error estimate as in backpropagation, it uses the correlation between changes in network weights and changes in the error + gaussian noise.
    • backpropagation requires calculation of the derivatives of the transfer function from one neuron to the output. This is very non-local information.
    • one alternative is somewhat empirical: compute the derivatives wrt the weights through perturbations.
    • all these algorithms are solutions to the optimization problem: minimize an error measure, E, wrt the network weights.
  • all network weights are updated synchronously.
  • can be used to train both feedforward and recurrent networks.
  • algorithm apparently has a long history, especially in visual research.
  • the algorithm is quite simple! easy to understand.
    • use stochastic weight changes with a annealing schedule.
  • this is pre-pub: tables and figures at the end.
  • looks like it has comparable or faster convergence then backpropagation.
  • not sure how it will scale to problems with hundreds of neurons; though, they looked at an encoding task with 32 outputs.

{669}
hide / / print
ref: Pearlmutter-2009.06 tags: sleep network stability learning memory date: 02-05-2009 19:21 gmt revision:1 [0] [head]

PMID-19191602 A New Hypothesis for Sleep: Tuning for Criticality.

  • Their hypothesis: in the course of learning, the brain's networks move closer to instability, as the process of learning and information storage requires that the network move closer to instability.
    • That is, a perfectly stable network stores no information: output is the same independent of input; a highly unstable network can potentially store a lot of information, or be a very selective or critical system: output is highly sensitive to input.
  • Sleep serves to restore the stability of the network by exposing it to a variety of inputs, checking for runaway activity, and adjusting accordingly. (inhibition / glia? how?)
  • Say that when sleep is not possible, an emergency mechanism must com into play, namely tiredness, to prevent runaway behavior.
  • (From wikipedia:) a potentially serious side-effect of many antipsychotics is that they tend to lower a individual's seizure threshold. Recall that removal of all dopamine can inhibit REM sleep; it's all somehow consistent, but unclear how maintaining network stability and being able to move are related.

{497}
hide / / print
ref: bookmark-0 tags: open source cellphone public network date: 11-13-2007 21:28 gmt revision:2 [1] [0] [head]

http://dotpublic.istumbler.net/

  • kinda high-level, rather amorphous, but generally in the right direction. The drive is there, the time is coming, but we are not quite there yet..
  • have some designs for wireless repeaters, based on 802.11g mini-pci cards in a SBC, 3 repeaters. total cost about $1000
  • also interesting: http://www.opencellphone.org/index.php?title=Main_Page

{7}
hide / / print
ref: bookmark-0 tags: book information_theory machine_learning bayes probability neural_networks mackay date: 0-0-2007 0:0 revision:0 [head]

http://www.inference.phy.cam.ac.uk/mackay/itila/book.html -- free! (but i liked the book, so I bought it :)

{20}
hide / / print
ref: bookmark-0 tags: neural_networks machine_learning matlab toolbox supervised_learning PCA perceptron SOM EM date: 0-0-2006 0:0 revision:0 [head]

http://www.ncrg.aston.ac.uk/netlab/index.php n.b. kinda old. (or does that just mean well established?)

{39}
hide / / print
ref: bookmark-0 tags: Numenta Bayesian_networks date: 0-0-2006 0:0 revision:0 [head]

http://www.numenta.com/Numenta_HTM_Concepts.pdf

  • shared, hierarchal representation reduces memory requirements, training time, and mirrors the structure of the world.
  • belief propagation techniques force the network into a set of mutually consistent beliefs.
  • a belief is a form of spatio-temporal quantization: ignore the unusual.
  • a cause is a persistent or recurring structure in the world - the root of a spatiotemporal pattern. This is a simple but important concept.
    • HTM marginalize along space and time - they assume time patterns and space patterns, not both at the same time. Temporal parameterization follows spatial parameterization.

{40}
hide / / print
ref: bookmark-0 tags: Bayes Baysian_networks probability probabalistic_networks Kalman ICA PCA HMM Dynamic_programming inference learning date: 0-0-2006 0:0 revision:0 [head]

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html very, very good! many references, well explained too.

{92}
hide / / print
ref: bookmark-0 tags: training neural_networks with kalman filters date: 0-0-2006 0:0 revision:0 [head]

with the extended kalman filter, from '92: http://ftp.ccs.neu.edu/pub/people/rjw/kalman-ijcnn-92.ps

with the unscented kalman filter : http://hardm.ath.cx/pdf/NNTrainingwithUnscentedKalmanFilter.pdf