m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{435} is owned by tlh24.
[0] Bar-Gad I, Morris G, Bergman H, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia.Prog Neurobiol 71:6, 439-73 (2003 Dec)

[0] Narayanan NS, Kimchi EY, Laubach M, Redundancy and synergy of neuronal ensembles in motor cortex.J Neurosci 25:17, 4207-16 (2005 Apr 27)

[0] Wood F, Fellows M, Donoghue J, Black M, Automatic spike sorting for neural decoding.Conf Proc IEEE Eng Med Biol Soc 6no Issue 4009-12 (2004)

{1570}
hide / / print
ref: -0 tags: Balduzzi backprop biologically plausible red-tape date: 05-31-2022 20:48 gmt revision:1 [0] [head]

Kickback cuts Backprop's red-tape: Biologically plausible credit assignment in neural networks

Bit of a meh -- idea is, rather than propagating error signals backwards through a hierarchy, you propagate only one layer + use a signed global reward signal. This works by keeping the network ‘coherent’ -- positive neurons have positive input weights, and negative neurons have negative weights, such that the overall effect of a weight change does not change sign when propagated forward through the network.

This is kind of a lame shortcut, imho, as it limits the types of functions that the network can model & the computational structure of the network. This is already quite limited by the dot-product-rectifier common structure (as is used here). Much more interesting and possibly necessary (given much deeper architectures now) is to allow units to change sign. (Open question as to whether they actually frequently do!). As such, the model is in the vein of "how do we make backprop biologically plausible by removing features / communication" rather than "what sorts of signals and changes does the brain use perceive and generate behavior".

This is also related to the literature on what ResNets do; what are the skip connections for? Amthropic has some interesting analyses for Transformer architectures, but checking the literature on other resnets is for another time.

{1568}
hide / / print
ref: -2021 tags: burst bio plausible gradient learning credit assignment richards apical dendrites date: 05-05-2022 15:44 gmt revision:2 [1] [0] [head]

Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits

  • Roughly, single-events indicate the normal feature responses of neurons, while multiple-spike bursts indicate error signals.
  • Bursts are triggered by depolarizing currents to the apical dendrites, which can be uncoupled from bottom-up event rate, which arises from perisomatic inputs / basal dendrites.
  • The fact that the two are imperfectly multiplexed is OK, as in backprop the magnitude of the error signal is modulated by the activity of the feature detector.
  • "For credit assignment in hierarchical networks, connections should obey four constraints:
    • Feedback must steer the magnitude and sign of plasticity
    • Feedback signals from higher-order areas must be multipleed with feedforward signals from lower-order areas so that credit assignment can percolate down the hierarch with minimal effect on sensory information
    • There should be some form of alignment between feedforward and feedback connections
    • Integration of credit-carrying signals should be nearly linear to avoid saturation
      • Seems it's easy to saturate the burst probability within a window of background event rate, e.g. the window is all bursts to no bursts.
  • Perisomatic inputs were short-term depressing, whereas apical dendrite synapses were short-term facilitating.
    • This is a form of filtering on burst rates? E.g. the propagate better down than up?
  • They experiment with a series of models, one for solving the XOR task, and subsequent for MNIST and CIFAR.
  • The later, larger models are mean-field models, rather than biophysical neuron models, and have a few extra features:
    • Interneurons, presumably SOM neurons, are used to keep bursting within a linear regime via a 'simple' (supplementary) learning rule.
    • Feedback alignment occurs by adjusting both the feedforward and feedback weights with the same propagated error signal + weight decay.
  • The credit assignment problem, or in the case of unsupervised learning, the coordination problem, is very real: how do you change a middle-feature to improve representations in higher (and lower) levels of the hierarchy?
    • They mention that using REINFORCE on the same network was unable to find a solution.
    • Put another way: usually you need to coordinate the weight changes in a network; changing weights individually based on a global error signal (or objective function) does not readily work...
      • Though evolution seems to be quite productive at getting the settings of (very) large sets of interdependent coefficients all to be 'correct' and (sometimes) beautiful.
      • How? Why? Friston's free energy principle? Lol.

{1564}
hide / / print
ref: -2008 tags: t-SNE dimensionality reduction embedding Hinton date: 01-25-2022 20:39 gmt revision:2 [1] [0] [head]

“Visualizing data using t-SNE”

  • Laurens van der Maaten, Geoffrey Hinton.
  • SNE: stochastic neighbor embedding, Hinton 2002.
  • Idea: model the data conditional pairwise distribution as a gaussian, with one variance per data point, p(x i|x j) p(x_i | x_j)
  • in the mapped data, this pairwise distribution is modeled as a fixed-variance gaussian, too, q(y i|y j) q(y_i | y_j)
  • Goal is to minimize the Kullback-Leibler divergence Σ iKL(p i||q i) \Sigma_i KL(p_i || q_i) (summed over all data points)
  • Per-data point variance is found via binary search to match a user-specified perplexity. This amounts to setting a number of nearest neighbors, somewhere between 5 and 50 work ok.
  • Cost function is minimized via gradient descent, starting with a random distribution of points yi, with plenty of momentum to speed up convergence, and noise to effect simulated annealing.
  • Cost function is remarkably simple to reduce, gradient update: δCδy i=2Σ j(p j|iq ji+p i|jq i|j)(y iy j) \frac{\delta C}{\delta y_i} = 2 \Sigma_j(p_{j|i} - q_{j-i} + p_{i|j} - q_{i|j})(y_i - y_j)
  • t-SNE differs from SNE (above) in that it addresses difficulty in optimizing the cost function, and crowding.
    • Uses a simplified symmetric cost function (symmetric conditional probability, rather than joint probability) with simpler gradients
    • Uses the student’s t-distribution in the low-dimensional map q to reduce crowding problem.
  • The crowding problem is roughly resultant from the fact that, in high-dimensional spaces, the volume of the local neighborhood scales as r m r^m , whereas in 2D, it’s just r 2 r^2 . Hence there is cost-incentive to pushing all the points together in the map -- points are volumetrically closer together in high dimensions than they can be in 2D.
    • This can be alleviated by using a one-DOF student distribution, which is the same as a Cauchy distribution, to model the probabilities in map space.
  • Smart -- they plot the topology of the gradients to gain insight into modeling / convergence behavior.
  • Don’t need simulated annealing due to balanced attractive and repulsive effects (see figure).
  • Enhance the algorithm further by keeping it compact at the beginning, so that clusters can move through each other.
  • Look up: d-bits parity task by Bengio 2007

{1537}
hide / / print
ref: -0 tags: cortical computation learning predictive coding reviews date: 02-23-2021 20:15 gmt revision:2 [1] [0] [head]

PMID-30359606 Predictive Processing: A Canonical Cortical Computation

  • Georg B Keller, Thomas D Mrsic-Flogel
  • Their model includes on two error signals: positive and negative for reconciling the sensory experience with the top-down predictions. I haven't read the full article, and disagree that such errors are explicit to the form of neurons, but the model is plausible. Hence worth recording the paper here.

PMID-23177956 Canonical microcircuits for predictive coding

  • Andre M Bastos, W Martin Usrey, Rick A Adams, George R Mangun, Pascal Fries, Karl J Friston
  • We revisit the established idea that message passing among hierarchical cortical areas implements a form of Bayesian inference-paying careful attention to the implications for intrinsic connections among neuronal populations.
  • Have these algorithms been put to practical use? I don't know...

Control of synaptic plasticity in deep cortical networks

  • Pieter R. Roelfsema & Anthony Holtmaat
  • Basically argue for a many-factor learning rule at the feedforward and feedback synapses, taking into account pre, post, attention, and reinforcement signals.
  • See comment by Tim Lillicrap and Blake Richards.

{1528}
hide / / print
ref: -2015 tags: olshausen redwood autoencoder VAE MNIST faces variation date: 11-27-2020 03:04 gmt revision:0 [head]

Discovering hidden factors of variation in deep networks

  • Well, they are not really that deep ...
  • Use a VAE to encode both a supervised signal (class labels) as well as unsupervised latents.
  • Penalize a combination of the MSE of reconstruction, logits of the classification error, and a special cross-covariance term to decorrelate the supervised and unsupervised latent vectors.
  • Cross-covariance penalty:
  • Tested on
    • MNIST -- discovered style / rotation of the characters
    • Toronto faces database -- seven expressions, many individuals; extracted eigen-emotions sorta.
    • Multi-PIE --many faces, many viewpoints ; was able to vary camera pose and illumination with the unsupervised latents.

{1521}
hide / / print
ref: -2005 tags: dimensionality reduction contrastive gradient descent date: 09-13-2020 02:49 gmt revision:2 [1] [0] [head]

Dimensionality reduction by learning and invariant mapping

  • Raia Hadsell, Sumit Chopra, Yann LeCun
  • Central idea: learn and invariant mapping of the input by minimizing mapped distance (e.g. the distance between outputs) when the samples are categorized as the same (same numbers in MNIST eg), and maximizing mapped distance when the samples are categorized as distant.
    • Two loss functions for same vs different.
  • This is an attraction-repulsion spring analogy.
  • Use gradient descent to change the weights to satisfy these two competing losses.
  • Resulting constitutional neural nets can extract camera pose information from the NORB dataset.
  • Surprising how simple analogies like this, when iterated across a great many samples, pull out intuitively correct invariances.

{1455}
hide / / print
ref: -0 tags: credit assignment distributed feedback alignment penn state MNIST fashion backprop date: 03-16-2019 02:21 gmt revision:1 [0] [head]

Conducting credit assignment by aligning local distributed representations

  • Alexander G. Ororbia, Ankur Mali, Daniel Kifer, C. Lee Giles
  • Propose two related algorithms: Local Representation Alignment (LRA)-diff and LRA-fdbk.
    • LRA-diff is basically a modified form of backprop.
    • LRA-fdbk is a modified version of feedback alignment. {1432} {1423}
  • Test on MNIST (easy -- many digits can be discriminated with one pixel!) and fashion-MNIST (harder -- humans only get about 85% right!)
  • Use a Cauchy or log-penalty loss at each layer, which is somewhat unique and interesting: L(z,y)= i=1 nlog(1+(y iz i) 2)L(z,y) = \sum_{i=1}^n{ log(1 + (y_i - z_i)^2)} .
    • This is hence a saturating loss.
  1. Normal multi-layer-perceptron feedforward network. pre activation h h^\ell and post activation z z^\ell are stored.
  2. Update the weights to minimize loss. This gradient calculation is identical to backprop, only they constrain the update to have a norm no bigger than c 1c_1 . Z and Y are actual and desired output of the layer, as commented. Gradient includes the derivative of the nonlinear activation function.
  3. Generaete update for the pre-nonlinearity h 1h^{\ell-1} to minimize the loss in the layer above. This again is very similar to backprop; its' the chain rule -- but the derivatives are vectors, of course, so those should be element-wise multiplication, not outer produts (i think).
    1. Note hh is updated -- derivatives of two nonlinearities.
  4. Feedback-alignment version, with random matrix E E_{\ell} (elements drawn from a gaussian distribution, σ=1\sigma = 1 ish.
    1. Only one nonlinearity derivative here -- bug?
  5. Move the rep and post activations in the specified gradient direction.
    1. Those h¯ 1\bar{h}^{\ell-1} variables are temporary holding -- but note that both lower and higher layers are updated.
  6. Do this K of times, K=1-50.
  • In practice K=1, with the LRA-fdbk algorithm, for the majority of the paper -- it works much better than LRA-diff (interesting .. bug?). Hence, this basically reduces to feedback alignment.
  • Demonstrate that LRA works much better with small initial weights, but basically because they tweak the algorithm to do this.
    • Need to see a positive control for this to be conclusive.
    • Again, why is FA so different from LRA-fdbk? Suspicious. Positive controls.
  • Attempted a network with Local Winner Take All (LWTA), which is a hard nonlinearity that LFA was able to account for & train through.
  • Also used Bernoulli neurons, and were able to successfully train. Unlike drop-out, these were stochastic at test time, and things still worked OK.

Lit review.
  • Logistic sigmoid can slow down learning, due to it's non-zero mean (Glorot & Bengio 2010).
  • Recirculation algorithm (or generalized recirculation) is a precursor for target propagation.
  • Target propagation is all about the inverse of the forward propagation: if we had access to the inverse of the network of forward propagations, we could compute which input values at the lower levels of the network would result in better values at the top that would please the global cost.
    • This is a very different way of looking at it -- almost backwards!
    • And indeed, it's not really all that different from contrastive divergence. (even though CD doesn't work well with non-Bernoulli units)
  • Contractive Hebbian learning also has two phases, one to fantasize, and done to try to make the fantasies look more like the input data.
  • Decoupled neural interfaces (Jaderberg et al 2016): learn a predictive model of error gradients (and inputs) nistead of trying to use local information to estimate updated weights.

  • Yeah, call me a critic, but I'm not clear on the contribution of this paper; it smells precocious and over-sold.
    • Even the title. I was hoping for something more 'local' than per-layer computation. BP does that already!
  • They primarily report supportive tests, not discriminative or stressing tests; how does the algorithm fail?
    • Certainly a lot of work went into it..
  • I still don't see how the computation of a target through a ransom matrix, then using delta/loss/error between that target and the feedforward activation to update weights, is much different than propagating the errors directly through a random feedback matrix. Eg. subtract then multiply, or multiply then subtract?

{1441}
hide / / print
ref: -2018 tags: biologically inspired deep learning feedback alignment direct difference target propagation date: 03-15-2019 05:51 gmt revision:5 [4] [3] [2] [1] [0] [head]

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

  • Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap
  • As is known, many algorithms work well on MNIST, but fail on more complicated tasks, like CIFAR and ImageNet.
  • In their experiments, backprop still fares better than any of the biologically inspired / biologically plausible learning rules. This includes:
    • Feedback alignment {1432} {1423}
    • Vanilla target propagation
      • Problem: with convergent networks, layer inverses (top-down) will map all items of the same class to one target vector in each layer, which is very limiting.
      • Hence this algorithm was not directly investigated.
    • Difference target propagation (2015)
      • Uses the per-layer target as h^ l=g(h^ l+1;λ l+1)+[h lg(h l+1;λ l+1)]\hat{h}_l = g(\hat{h}_{l+1}; \lambda_{l+1}) + [h_l - g(h_{l+1};\lambda_{l+1})]
      • Or: h^ l=h l+g(h^ l+1;λ l+1)g(h l+1;λ l+1)\hat{h}_l = h_l + g(\hat{h}_{l+1}; \lambda_{l+1}) - g(h_{l+1};\lambda_{l+1}) where λ l\lambda_{l} are the parameters for the inverse model; g()g() is the sum and nonlinearity.
      • That is, the target is modified ala delta rule by the difference between inverse-propagated higher layer target and inverse-propagated higher level activity.
        • Why? h lh_{l} should approach h^ l\hat{h}_{l} as h l+1h_{l+1} approaches h^ l+1\hat{h}_{l+1} .
        • Otherwise, the parameters in lower layers continue to be updated even when low loss is reached in the upper layers. (from original paper).
      • The last to penultimate layer weights is trained via backprop to prevent template impoverishment as noted above.
    • Simplified difference target propagation
      • The substitute a biologically plausible learning rule for the penultimate layer,
      • h^ L1=h L1+g(h^ L;λ L)g(h L;λ L)\hat{h}_{L-1} = h_{L-1} + g(\hat{h}_L;\lambda_L) - g(h_L;\lambda_L) where there are LL layers.
      • It's the same rule as the other layers.
      • Hence subject to impoverishment problem with low-entropy labels.
    • Auxiliary output simplified difference target propagation
      • Add a vector zz to the last layer activation, which carries information about the input vector.
      • zz is just a set of random features from the activation h L1h_{L-1} .
  • Used both fully connected and locally-connected (e.g. convolution without weight sharing) MLP.
  • It's not so great:
  • Target propagation seems like a weak learner, worse than feedback alignment; not only is the feedback limited, but it does not take advantage of the statistics of the input.
    • Hence, some of these schemes may work better when combined with unsupervised learning rules.
    • Still, in the original paper they use difference-target propagation with autoencoders, and get reasonable stroke features..
  • Their general result that networks and learning rules need to be tested on more difficult tasks rings true, and might well be the main point of this otherwise meh paper.

{1453}
hide / / print
ref: -2019 tags: lillicrap google brain backpropagation through time temporal credit assignment date: 03-14-2019 20:24 gmt revision:2 [1] [0] [head]

PMID-22325196 Backpropagation through time and the brain

  • Timothy Lillicrap and Adam Santoro
  • Backpropagation through time: the 'canonical' expansion of backprop to assign credit in recurrent neural networks used in machine learning.
    • E.g. variable rol-outs, where the error is propagated many times through the recurrent weight matrix, W TW^T .
    • This leads to the exploding or vanishing gradient problem.
  • TCA = temporal credit assignment. What lead to this reward or error? How to affect memory to encourage or avoid this?
  • One approach is to simply truncate the error: truncated backpropagation through time (TBPTT). But this of course limits the horizon of learning.
  • The brain may do BPTT via replay in both the hippocampus and cortex Nat. Neuroscience 2007, thereby alleviating the need to retain long time histories of neuron activations (needed for derivative and credit assignment).
  • Less known method of TCA uses RTRL Real-time recurrent learning forward mode differentiation -- δh t/δθ\delta h_t / \delta \theta is computed and maintained online, often with synaptic weight updates being applied at each time step in which there is non-zero error. See A learning algorithm for continually running fully recurrent neural networks.
    • Big problem: A network with NN recurrent units requires O(N 3)O(N^3) storage and O(N 4)O(N^4) computation at each time-step.
    • Can be solved with Unbiased Online Recurrent optimization, which stores approximate but unbiased gradient estimates to reduce comp / storage.
  • Attention seems like a much better way of approaching the TCA problem: past events are stored externally, and the network learns a differentiable attention-alignment module for selecting these events.
    • Memory can be finite size, extending, or self-compressing.
    • Highlight the utility/necessity of content-addressable memory.
    • Attentional gating can eliminate the exploding / vanishing / corrupting gradient problems -- the gradient paths are skip-connections.
  • Biologically plausible: partial reactivation of CA3 memories induces re-activation of neocortical neurons responsible for initial encoding PMID-15685217 The organization of recent and remote memories. 2005

  • I remain reserved about the utility of thinking in terms of gradients when describing how the brain learns. Correlations, yes; causation, absolutely; credit assignment, for sure. Yet propagating gradients as a means for changing netwrok weights seems at best a part of the puzzle. So much of behavior and internal cognitive life involves explicit, conscious computation of cause and credit.
  • This leaves me much more sanguine about the use of external memory to guide behavior ... but differentiable attention? Hmm.

{1409}
hide / / print
ref: -0 tags: coevolution fitness prediction schmidt genetic algorithm date: 09-14-2018 01:34 gmt revision:8 [7] [6] [5] [4] [3] [2] [head]

Coevolution of Fitness Predictors

  • Michael D. Schmidt and Hod Lipson, Member, IEEE
  • Fitness prediction is a technique to replace fitness evaluation in evolutionary algorithms with a light-weight approximation that adapts with the solution population.
    • Cannot approximate the full landscape, but shift focus during evolution.
    • Aka local caching.
    • Or adversarial techniques.
  • Instead use coevolution, with three populations:
    • 1) solutions to the original problem, evaluated using only fitness predictors;
    • 2) fitness predictors of the problem; and
    • 3) fitness trainers, whose exact fitness is used to train predictors.
      • Trainers are selected high variance solutions across the predictors, and predictors are trained on this subset.
  • Lightweight fitness predictors evolve faster than the solution population, so they cap the computational effort on that at 5% overall effort.
    • These fitness predictors are basically an array of integers which index the full training set -- very simple and linear. Maybe boring, but the simplest solution that works ...
    • They only sample 8 training examples for even complex 30-node solution functions (!!).
    • I guess, because the information introduced into the solution set is relatively small per generation, it makes little sense to over-sample or over-specify this; all that matters is that, on average, it's directionally correct and unbiased.
  • Used deterministic crowding selection as the evolutionary algorithm.
    • Similar individuals have to compete in tournaments for space.
  • Showed that the coevolution algorithm is capable of inferring even highly complex many-term functions
    • And, it uses function evaluations more efficiently than the 'exact' (each solution evaluated exactly) algorithm.
  • Coevolution algorithm seems to induce less 'bloat' in the complexity of the solutions.
  • See also {842}

{1370}
hide / / print
ref: -0 tags: juxtacellular recording gold mushroom cultured hippocampal neurons Spira date: 02-01-2017 02:44 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Large-Scale Juxtacellular Recordings from Cultured Hippocampal Neurons by an Array of Gold-Mushroom Shaped Microelectrodes

  • Micrometer sized Au mushroom MEA electrodes.
  • Functionalized by poly-ethylene-imine (PEI, positively charged)/laminin (extracellular matrix protein) undergo a process to form juxtacellular junctions between the neurons and the gMµEs.
  • No figures, but:
    • Whereas substrate integrated planar MEA record FPs dominated by negative-peak or biphasic-signals with amplitudes typically ranging between 40-100 µV and a signal to noise ratio of ≤ 5,
    • The gMµE-MEA recordings were dominated by positive monophasic action potentials.
    • It is important to note that monophasic high peak amplitudes ≥ 100 µV are rarely obtained using planar electrodes arrays, whereas when using the gMµE-MEA, 34.48 % of the gMµEs recorded potentials ≥ 200 µV and 10.64 % recorded potentials in the range of 300-5,085 µV.
  • So, there is a distribution of coupling, approximately 10% "good".

PMID-27256971 Multisite electrophysiological recordings by self-assembled loose-patch-like junctions between cultured hippocampal neurons and mushroom-shaped microelectrodes.

  • Note 300uV - 1mV extracellular 'juxtacellular' action potentials from these mushroom recordings. This is 2 - 5x better than microwire extacellular in-vivo ephys; coupling is imperfect.
    • Sharp glass-insulated W electrodes, ~ 10Mohm, might achieve better SNR if driven carefully.
  • 2um mushroom cap Au electrodes, 1um diameter 1um long shaft
    • No coating, other than the rough one left by electroplating process.
    • Impedance 10 - 25 Mohm.
  • APs decline within a burst of up to 35% -- electrostatic reasons?
  • Most electrodes record more than one neuron, similar to in-vivo ephys, with less LFP coupling.

PMID-23380931 Multi-electrode array technologies for neuroscience and cardiology

  • The key to the multi-electrode-array ‘in-cell recording’ approach developed by us is the outcome of three converging cell biological principals:
    • (a) the activation of endocytotic-like mechanisms in which cultured Aplysia neurons are induced to actively engulf gold mushroom-shaped microelectrodes (gMμE) that protrude from a flat substrate,
    • (b) the generation of high Rseal between the cell’s membrane and the engulfed gMμE, and
    • (c) the increased junctional membrane conductance.
  • Functionalized the Au mushrooms with an RGD-based peptide
    • RGD is an extracellular matrix binding site on fibronectin, which mediates it's interaction with integrin, a cell surface receptor; it is thought that other elements of fibronectin regulate specificity with its receptor. PMID-2418980

{1343}
hide / / print
ref: -0 tags: planned economy red plenty date: 08-08-2016 05:54 gmt revision:0 [head]

http://crookedtimber.org/2012/05/30/in-soviet-union-optimization-problem-solves-you/#demographic-back

  • Quote: "That planning is not a viable alternative to capitalism (as opposed to a tool within it) should disturb even capitalism’s most ardent partisans. It means that their system faces no competition, nor even any plausible threat of competition."
    • And therefore not only cannot be improved, but must degrade with time. But see below.
  • Quote: What we can do is try to find the specific ways in which these powers we have conjured up are hurting us, and use them to check each other, or deflect them into better paths. Sometimes this will mean more use of market mechanisms, sometimes it will mean removing some goods and services from market allocation, either through public provision or through other institutional arrangements. Sometimes it will mean expanding the scope of democratic decision-making (for instance, into the insides of firms), and sometimes it will mean narrowing its scope (for instance, not allowing the demos to censor speech it finds objectionable). Sometimes it will mean leaving some tasks to experts, deferring to the internal norms of their professions, and sometimes it will mean recognizing claims of expertise to be mere assertions of authority, to be resisted or countered.
    • I like to think of this as a very unstable equilibrium: the only way to maintain function is to continuously expend energy to shore up and change the market, politics, and society in general; the specific regulatory solution has complexity commensurate with the complexity of the economy regulated, and it must adapt on the same scales that the market economy changes.
    • Perhaps to do this, it needs a self-reflective faculty, to know which parts of itself need changing; otherwise, you'd need to have a regulator regulating the regulator, and who is to prevent that from agglomerating power. Yet this too is an unstable equilibrium.

{1308}
hide / / print
ref: -0 tags: polyimide polyamide basic reduction salt surface modification date: 02-27-2015 19:45 gmt revision:0 [head]

Kinetics of Alkaline Hydrolysis of a Polyimide Surface

  • The alkaline hydrolysis of a polyimide (PMDA-ODA) surface was studied as a function of time, temperature and hydroxide ion concentration.
  • Quantification of the number of carboxylic acid groups formed on the modified polyimide surface was accomplished by analysis of data from contact angle titration experiments.
  • Using a large excess of base, pseudo-first-order kinetics were found, yielding kobs ≈ 0.1−0.9 min-1 for conversion of polyimide to poly(amic acid) depending on [OH-].
  • From the dependence of kobs on [OH-], a rate equation is proposed.
  • Conversion of the polyimide surface to one of poly(amic acid) was found to reach a limiting value with a formation constant, K, in the range 2−10 L·mol-1.

{1182}
hide / / print
ref: -0 tags: optical recording voltage sensitive dyes redshirt date: 01-02-2013 03:17 gmt revision:3 [2] [1] [0] [head]

PMID-16050036 Imaging brain activity with voltage- and calcium-sensitive dyes.

  • Voltage-sensitive dyes are well suited for measuring synaptic integration, as:
    • Electrodes are too blunt to effectively record these fine, < 1um diameter structures.
    • The surface area to volume ratio is highest in the dendrites
    • Voltage-sensitive dyes also permeate internal membranes not subject to voltage gradients, hence this does not contribute to the signal, leading to a decreased ΔF/F\Delta F / F .
  • Dominant experimental noise is shot noise, statistical -- see {1181}.
  • modern dyes and imagers can reliably record single action potentials; spatial averaging yields similar resolution as electrical recording.
  • They performed optical recording of Aplysia sensory ganglia, and discovered following light tail touch: "It is almost as if the Aplysia nervous system is designed such that every cell in the abdominal ganglion cares about this (and perhaps every) sensory stimulus. In addition, more than 1000 neurons in other ganglia are activated by this touch..."
      • These results force a more pessimistic view of the present understanding of the neuronal basis of apparently simple behaviors in relatively simple nervous systems.
  • Used calcium imaging on olfactory glomeruli of mice and turtles; measurements were limited by either shot-noise or heart/breathing artifacts.
  • Confocal and two-photon microscopes, due to their exchange of spatial resolution for sensitivity, are not useful with voltage-sensitive dyes.
    • The generation of fluorescent photons in the 2-photon confocal microscope is not efficient. We compared the signals from Calcium Green-1 in the mouse olfactory bulb using 2-photon and ordinary microscopy. In this comparison the number of photons contributing to the intensity measurement in the 2-photon confocal microscope was about 1000 times smaller than the number measured with the conventional microscope and a CCD camera.
  • By the numbers, quote: Because only a small fraction of the 10e16 photons/ms emitted by a tungsten filament source will be measured, a signal-to-noise ratio of 10e8 (see above) cannot be achieved. A partial listing of the light losses follows. A 0.9-NA lamp collector lens would collect 0.1 of the light emitted by the source. Only 0.2 of that light is in the visible wavelength range; the remainder is infrared (heat). Limiting the incident wavelengths to those, which have the signal means, that only 0.1 of the visible light is used. Thus, the light reaching the
preparation might typically be reduced to 1013 photons/ms. If the light-collecting system that forms the image has high efficiency e.g., in an absorption measurement, about 1013 photons/ms will reach the image plane. (In a fluorescence measurement there will be much less light measured because 1. only a fraction of the incident photons are absorbed by the fluorophores, 2. only a fraction of the absorbed photons appear as emitted photons, and 3. only a fraction of the emitted photons are collected by the objective.) If the camera has a quantum efficiency of 1.0, then, in absorption, a total of 10e13 photoelectrons/ms will be measured. With a camera of 1000 pixels, there will be 10e10 photoelectrons/ms/pixel. The shot noise will be 10e5 photoelectrons/ms/pixel; thus the very best that can be expected is a noise that is 10e−5 of the resting light (a signal-to-noise ratio of 100 db). The extra light losses in a fluorescence measurement will further reduce the maximum obtainable signal-to-noise ratio.

{1181}
hide / / print
ref: -0 tags: neural imaging recording shot noise redshirt date: 01-02-2013 02:20 gmt revision:0 [head]

http://www.redshirtimaging.com/redshirt_neuro/neuro_lib_2.htm

  • Shot Noise: The limit of accuracy with which light can be measured is set by the shot noise arising from the statistical nature of photon emission and detection.
    • If an ideal light source emits an average of N photons/ms, the RMS deviation in the number emitted is N\sqrt N .
    • At high intensities this ratio NN\frac{N}{\sqrt N} is large and thus small changes in intensity can be detected. For example, at 10^10 photons/ms a fractional intensity change of 0.1% can be measured with a signal-to-noise ratio of 100.
    • On the other hand, at low intensities this ratio of intensity divided by noise is small and only large signals can be detected. For example, at 10^4 photons/msec the same fractional change of 0.1% can be measured with a signal-to-noise ratio of 1 only after averaging 100 trials.

{1144}
hide / / print
ref: -0 tags: dopamine reinforcement learning funneling reduction basal ganglia striatum DBS date: 02-28-2012 01:29 gmt revision:2 [1] [0] [head]

PMID-15242667 Anatomical funneling, sparse connectivity and redundancy reduction in the neural networks of the basal ganglia

  • Major attributes of the BG:
    • Numerical reduction in the number of neurons across layers of the 'feed forward' (wrong!) network,
    • lateral inhibitory connections within the layers
    • modulatory effects of dopamine and acetylcholine.
  • Stochastic decision making task in monkeys.
  • Dopamine and ACh deliver different messages. DA much more specific.
  • Output nuclei of BG show uncorrelated activity.
    • THey see this as a means of compression -- more likely it is a training signal.
  • Striatum:
    • each striatal projection neuron receives 5300 cortico-striatal synapses; the dendritic fields of same contains 4e5 axons.
    • Say that a typical striatal neuron is spherical (?).
    • Striatal dendritic tree is very dense, whereas pallidal dendritic tree is sparse, with 4 main and 13 tips.
    • A striatal axon provides 240 synapses in the pallidum and makes 10 contacts with one pallidal neuron on average.
  • I don't necessarily disagree with the information-compression hypothesis, but I don't disagree either.
    • Learning seems a more likely hypothesis; could be that we fail to see many effects due to the transient nature of the signals, but I cannot do a thorough literature search on this.

PMID-15233923 Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons.

  • Same task as above.
  • both ACh (putatively, TANs in this study) and DA neurons respond to reward related events.
  • dopamine neurons' response reflects mismatch between expectation and outcome in the positive domain
  • TANs are invariant to reward predictability.
  • TANs are synchronized; most DA neurons are not.
  • Striatum displays the densest staining in the CNS for dopamine (Lavoie et al 1989) and ACh (Holt et al 1997)
    • Depression of striatal acetylcholine can be used to treat PD (Pisani et al 2003).
    • Might be a DA/ ACh balance problem (Barbeau 1962).
  • Deficit of either DA or ACh has been shown to disrupt reward-related learning processes. (Kitabatake et al 2003, Matsumoto 1999, Knowlton et al 1996).
  • Upon reward, dopaminergic neurons increase firing rate, whereas ACh neurons pause.
  • Primates show overshoot -- for a probabalistic relative reward, they saturate anything above 0.8 probability to 1. Rats and pigeons do not show this effect (figure 2 F).

{1140}
hide / / print
ref: -0 tags: dopamine reward prediction striatum error striatum orbitofrontal reward date: 02-24-2012 21:26 gmt revision:1 [0] [head]

PMID-11105648 Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.

  • Many regions have a complex set of activations, but dopamine neurons appear more homogenous: they report the error in reward prediction.
    • "The homogeneity of responsiveness across the population of dopamine neurons indicates that this error signal is widely broadcast to dopamine terminal regions where it could provide a teaching signal for synaptic modifications underlying the learning of goal-directed appetitive behaviors."
    • Signals are not contingent on the type of behavior needed to obtain the reward, and hence represent a relatively 'pure' reward prediction error.
  • Unlike dopamine neurons, many striatal neurons respond to predicted rewards, although at least some may reflect the relative degree of predictability in the magnitude of the responses to reward.
  • Neuronal activations in the orbitofrontal cortex appear to involve less integration of behavioral and reward-related information, but rather incorporate another aspect of reward, the relative motivational significance of different rewards.
  • Processing is hierarchical (or supposed to be so):
    • Dopamine neurons provide a relatively pure signal of an error in reward prediction,
    • Striatal neurons signal not only reward, but also behavioral contingencies,
    • Orbitofrontal neurons signal reward and incorporate relative reward preference.

{255}
hide / / print
ref: BarGad-2003.12 tags: information dimensionality reduction reinforcement learning basal_ganglia RDDR SNR globus pallidus date: 01-16-2012 19:18 gmt revision:3 [2] [1] [0] [head]

PMID-15013228[] Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia (2003)

  • long paper! looks like they used latex.
  • they focus on a 'new model' for the basal ganglia: reinforcement driven dimensionality reduction (RDDR)
  • in order to make sense of the system - according to them - any model must ingore huge ammounts of information about the studied areas.
  • ventral striatum = nucelus accumbens!
  • striatum is broken into two, rough, parts: ventral and dorsal
    • dorsal striatum: the caudate and putamen are a part of the
    • ventral striatum: the nucelus accumbens, medial and ventral portions of the caudate and putamen, and striatal cells of the olifactory tubercle (!) and anterior perforated substance.
  • ~90 of neurons in the striatum are medium spiny neurons
    • dendrites fill 0.5mm^3
    • cells have up and down states.
      • the states are controlled by intrinsic connections
      • project to GPe GPi & SNr (primarily), using GABA.
  • 1-2% of neurons in the striatum are tonically active neurons (TANs)
    • use acetylcholine (among others)
    • fewer spines
    • more sensitive to input
    • TANs encode information relevant to reinforcement or incentive behavior

____References____

{222}
hide / / print
ref: neuro notes-0 tags: clementine thesis electrophysiology fit predictions tlh24 date: 01-06-2012 03:07 gmt revision:4 [3] [2] [1] [0] [head]

ok, so i fit all timestamps from clem022007001 & timarm_log_070220_173947_k.mat to clementine's behavior, and got relatively low SNR for almost everything - despite the fact that I am most likely overfitting. (bin size = 7802 x 1491) the offset is calibrated @ 2587 ms + 50 to center the juice artifact in the first bin. There are 10 lags. There are 21 sorted units.

same thing, but with only the sorted units. juice prediction is, of course, worse.

now, for file clem022007002 & timarm_log_070220_175636_k.mat. first the unsorted:

and the sorted:

{992}
hide / / print
ref: Kim-2006.06 tags: Hyun Kim Carmena Nicolelis continuous shared control gripper BMI date: 01-06-2012 00:20 gmt revision:2 [1] [0] [head]

IEEE-1634510 (pdf) Continuous shared control for stabilizing reaching and grasping with brain-machine interfaces.

  • The pneumatic gripper for picking up objects.
  • 70% brain control, 30% sensor control optimal.
  • Talk about 20Hz nyquist frequency for fast human motor movements, versus the need to smooth and remove noise.
  • Method: proximity sensors
    • collision avoidance 'pain withdrawal'
    • 'infant palmar grasp reflex'
    • Potential field associated with these sensors to implement continuous shared control.
  • Not! online -- used Aurora's data.

____References____

Kim, H.K. and Biggs, J. and Schloerb, W. and Carmena, M. and Lebedev, M.A. and Nicolelis, M.A.L. and Srinivasan, M.A. Continuous shared control for stabilizing reaching and grasping with brain-machine interfaces Biomedical Engineering, IEEE Transactions on 53 6 1164 -1173 (2006)

{929}
hide / / print
ref: Kim-2007.08 tags: Hyun Kim muscle activation method BMI model prediction kinarm impedance control date: 01-06-2012 00:19 gmt revision:1 [0] [head]

PMID-17694874[0] The muscle activation method: an approach to impedance control of brain-machine interfaces through a musculoskeletal model of the arm.

  • First BMI that successfully predicted interactions between the arm and a force field.
  • Previous BMIs are used to decode position, velocity, and acceleration, as each of these has been shown to be encoded in the motor cortex
  • Hyun talks about stiff tasks, like writing on paper vs . pliant tasks, like handling an egg; both require a mixture of force and position control.
  • Georgopoulous = velocity; Evarts = Force; Kalaska movement and force in an isometric task; [17-19] = joint dependence;
  • Todorov "On the role of primary motor cortex in arm movement control" [20] = muscle activation, which reproduces Georgouplous and Schwartz ("Direct cortical representation of drawing".
  • Kakei [19] "Muscle movement representations in the primary motor cortex" and Li [23] [1] show neurons correlate with both muscle activations and direction.
  • Argues that MAM is the best way to extract impedance information -- direct readout of impedance requires a supervised BMI to be trained on data where impedance is explicitly measured.
  • linear filter does not generalize to different force fields.
  • algorithm activity highly correlated with recorded EMG.
  • another interesting ref: [26] "Are complex control signals required for human arm movements?"

____References____

[0] Kim HK, Carmena JM, Biggs SJ, Hanson TL, Nicolelis MA, Srinivasan MA, The muscle activation method: an approach to impedance control of brain-machine interfaces through a musculoskeletal model of the arm.IEEE Trans Biomed Eng 54:8, 1520-9 (2007 Aug)
[1] Li CS, Padoa-Schioppa C, Bizzi E, Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field.Neuron 30:2, 593-607 (2001 May)

{995}
hide / / print
ref: QingBai and Wise-2001.08 tags: Bai Wise buffered MEA recording electrodes Michigan date: 01-05-2012 04:53 gmt revision:5 [4] [3] [2] [1] [0] [head]

IEEE-936367 (pdf) Single-unit neural recording with active microelectrode arrays

  • Design neural probes with on-chip unity-gain amplifiers. Proven to not degrade recordings (indeed, it should help!)
  • 200ohm output impedance
  • 11uV RMS noise, 100Hz-10kHz.
  • Multiplexer adds 8uV rms noise. noise from clock transitions 2ppm.
  • Also built amplifiers with 40db voltage gain (100x).

____References____

Qing Bai and Wise, K.D. Single-unit neural recording with active microelectrode arrays Biomedical Engineering, IEEE Transactions on 48 8 911 -920 (2001)

{996}
hide / / print
ref: Najafi-1986.12 tags: Najafi implantable wired recording Michigan array multiplexing silicon boron MEA date: 01-05-2012 03:07 gmt revision:8 [7] [6] [5] [4] [3] [2] [head]

IEEE-1052646 (pdf) An implantable multielectrode array with on-chip signal processing

  • "The major reason for the slow progress in the understanding of neural circuits has been the lack of adequate instrumentation."
  • previous photolithographic: [4],[5]. Their first publication: [7].
  • Kensall Wise, not Stephen.
  • Single shank
  • 10 recording sites spaced at 100um
  • Amplifying 100x, b/w 15kHz., multiplexing.
  • width: 15um near tip, 160um at base.
  • 3 leads (!) power, ground, data.
  • 6um LOCOS enhancement and depletion NMOS technology -- not CMOS. (latter is prone to latch-up)
  • 5mW power.
  • boron dope silicon, etch back non doped portion with ethylenediamine-pyrocatechol (EDP) water solution.
  • must not have any substrate bias!

____References____

Najafi, K. and Wise, K.D. An implantable multielectrode array with on-chip signal processing Solid-State Circuits, IEEE Journal of 21 6 1035 - 1044 (1986)

{864}
hide / / print
ref: -0 tags: the edge ideas future prediction date: 01-03-2011 19:26 gmt revision:2 [1] [0] [head]

Interesting ideas from __This Will Change Everything__

  • Daniel Dennett suggests that what is changing everything is the act of looking at what is changing everything: "When we look closely at looking closely, when we increase our investment in techniques for investing in techniques, this is what amplify uncertainties, what will change everything. We figure out how to game the system, and this initiates an arm race to control or prevent gaming of the system, which leads to new levels of gamesmanship, and so on."
    • Well said. I think this is an essential part of any creative economy.
  • The internet is humanity's growing global hindbrain: it attends itself with rote memory, managing commerce and markets, and doling out attention. This implies that eventually it will be a global forebrain
    • W. Danniel Hillis argues that it will do this through recursive hierarchical organization. But, that said, there is still no good way for making decisions with higher intelligence than each of the actors/voters. (really? are you sure this is not just an artifact of perception?)
  • Paul Saffo: "But there is one development that would fundamentally change everything: the discovert of nonhuman intelligences equal or superior to our own species. It would change everything because our crowded, quarreling species is lonely. Vastly, achingly, existentially lonely"
    • [If we do find someone/thing else:] "And despite the distance, of course we will try to talk to them. A third of us will try to conquer them, a third of us will seek to convert them, and the rest of us will try to sell them something". hah!
  • Mentioned: focus fusion technology and http://focusfusion.org/ -- looks excellent, the argument seems convincing. Why doesn't somebody throw some money at them, get it done and tested?
  • John Gottman paraphrases Peggy Sanday: "Military - or any hierarchical - social structure cannot last without external threat" Unfortunately, hierarchical structures (human and otherwise) also seem to be the best way for getting things done.

{754}
hide / / print
ref: Gilbert-2009.03 tags: human prediction estimation social situation neighbor advice affective forecasting date: 06-10-2009 15:13 gmt revision:2 [1] [0] [head]

PMID-19299622[0] The Surprising Power of Neighborly Advice.

  • quote (I cannot say this any better!): "People make systematic errors when attempting to predict their affective reactions to future events, and these errors have social (1–3), economic (4–8), legal (9, 10), and medical (11–22) consequences. For example, people have been shown to overestimate how unhappy they will be after receiving bad test results (23), becoming disabled (14, 19–21), or being denied a promotion (24), and to overestimate how happy they will be after winning a prize (6), initiating a romantic relationship (24), or taking revenge against those who have harmed them (3). Research suggests that the main reason people mispredict their affective reactions to future events is that they imagine those events inaccurately (25). For example, people tend to imagine the essential features of future events but not the incidental features (26–28), the early moments of future events but not the later moments (17, 24), and so on. When mental simulations of events are inaccurate, the affective forecasts that are based on them tend to be inaccurate as well."
  • solution, ala François de La Rochefoucauld: "Before we set our hearts too much upon anything," he wrote, "let us first examine how happy those are who already possess it"
    • this is surrogation ; it relies not on mental simulation, hence is immune to the associated systematic errors.
    • problem is that people differ. paper agues that, in fact, they don't all that much - the valuations & affective reactions are produced by evolutionarily ancient physiological mechanisms. Furthermore, people's neighbors, friends, and peers are likely to all be similar in personality and preference via self-selection and social reinforcement - hence their reactions to a situation will be similar.
  • They used a speed-dating scenario in their experiments, from which they observe: "Women made more accurate predictions about how much they would enjoy a date with a man when they knew how much another woman in their social network enjoyed dating the man than when they read the man's personal profile and saw his photograph."
  • Next, they employ personality-evaluation "Men and women made more accurate predictions about how they would feel after being evaluated by a peer when they knew how another person in their social network had felt after being evaluated than when they previewed the evaluation itself."
  • Conclusion: "But given people's mistaken beliefs about the relative ineffectiveness of surrogation and their misplaced confidence in the accuracy of their own mental simulations (39), it seems likely that in everyday life, La Rochefoucauld's advice—like the advice of good neighbors—is more often than not ignored.
  • Editorializing: I'm not quite convinced that 'neighborly advice' is an accurate predictor of our absolute reaction to a situation as much as it socially informs us of reaction we are *supposed* to have. Society by consensus - that's what some of my European friends dislike about (some parts of) American culture. They need to run some controls in other cultures (?)

____References____

[0] Gilbert DT, Killingsworth MA, Eyre RN, Wilson TD, The surprising power of neighborly advice.Science 323:5921, 1617-9 (2009 Mar 20)

{283}
hide / / print
ref: Narayanan-2005.04 tags: Laubach M1 motor rats statistics BMI prediction methods date: 09-07-2008 19:51 gmt revision:4 [3] [2] [1] [0] [head]

PMID-15858046[] Redundancy and Synergy of Neuronal Ensembles in Motor Cortex

  • timing task.
  • rats.
  • 50um teflon microwires in motor cortex
  • ohno : neurons that were the best predictors of task performance were not necessarily the neurons that contributed the most predictive information to an ensemble of neurons.
  • most all contribute redundant predictive information to the ensemble.
    • this redundancy kept the predictions high, even if neurons were dropped.
  • small groups of neurons were more synergistic
  • large groups were more redundant.
  • used wavelet based discriminant pursuit.
    • validated with draws from a random data set.
  • used R and Weka
  • data looks hella noisy ?

____References____

{540}
hide / / print
ref: Erickson-2003.07 tags: GFP FRET math CFP DsRed math date: 02-08-2008 16:04 gmt revision:1 [0] [head]

PMID-12829514 DsRed as a Potential FRET Partner with CFP and GFP

{450}
hide / / print
ref: notes-0 tags: leadership dilbert redirection politics trolls date: 09-28-2007 18:11 gmt revision:0 [head]

a very nice synopsis of how leadership works:

{258}
hide / / print
ref: Wood-2004.01 tags: spikes sorting BMI Black Donoghue prediction kalman date: 04-06-2007 21:57 gmt revision:2 [1] [0] [head]

PMID-17271178[0] automatic spike sorting for neural decoding

  • idea: select the number of units (and, indeed, clustering) based on the ability to predict a given variable. makes sense!
  • results:
    • human sorting: 13,5 cm^2 MSE
    • automatic spike sorting: 11.4 cm^2 MSE
      • yes, I know, the increase is totally dramatic.
  • they do not say if this could be implemented in realtime or not. hence, probably not.

____References____

{127}
hide / / print
ref: bookmark-0 tags: thalamus basal ganglia neuroanatomy centromedian red nucleus images date: 0-0-2007 0:0 revision:0 [head]

http://www.neuroanatomy.wisc.edu/coro97/contents.htm --coronal sections through the thalamus, very nice!

{4}
hide / / print
ref: bookmark-0 tags: google parallel_computing GFS algorithm mapping reducing date: 0-0-2006 0:0 revision:0 [head]

http://labs.google.com/papers/mapreduce.html

{81}
hide / / print
ref: Stapleton-2006.04 tags: Stapleton Lavine poisson prediction gustatory discrimination statistical_model rats bayes BUGS date: 0-0-2006 0:0 revision:0 [head]

PMID-16611830

http://www.jneurosci.org/cgi/content/full/26/15/4126

{61}
hide / / print
ref: bookmark-0 tags: smith predictor motor control wolpert cerebellum machine_learning prediction date: 0-0-2006 0:0 revision:0 [head]

http://prism.bham.ac.uk/pdf_files/SmithPred_93.PDF

  • quote in reference to models in which the cerebellum works as a smith predictor, e.g. feedforward prediction of the behavior of the limbs, eyes, trunk: Motor performance based on the use of such internal models would be degraded if the model was inavailable or inaccurate. These theories could therefore account for dysmetria, tremor, and dyssynergia, and perhaps also for increased reaction times.
  • note the difference between inverse model (transforms end target to a motor plan) and inverse models 9is used on-line in a tight feedback loop).
  • The difficulty becomes one of detecting mismatches between a rapid prediction of the outcome of a movement and the real feedback that arrives later in time (duh! :)
  • good set of notes on simple simulated smith predictor performance.