m8ta
You are not authenticated, login. 

{1549}  
Put this in ~/.config/gtk3.0/gtk.css make scrollbars larger on highDPI screens. ref .scrollbar { GtkScrollbarhasbackwardstepper: 1; GtkScrollbarhasforwardstepper: 1; GtkRangesliderwidth: 16; GtkRangesteppersize: 16; } scrollbar slider { /* Size of the slider */ minwidth: 16px; minheight: 16px; borderradius: 16px; /* Padding around the slider */ border: 2px solid transparent; } .scrollbar.vertical slider, scrollbar.vertical slider { minheight: 16px; minwidth: 16px; } .scrollbar.horizontal.slider, scrollbar.horizontal slider { minwidth: 16px; minheight: 16px; } /* Scrollbar trough squeezes when cursor hovers over it. Disabling that */ .scrollbar.vertical:hover:dir(ltr), .scrollbar.vertical.dragging:dir(ltr) { marginleft: 0px; } .scrollbar.vertical:hover:dir(rtl), .scrollbar.vertical.dragging:dir(rtl) { marginright: 0px; } .scrollbar.horizontal:hover, .scrollbar.horizontal.dragging, .scrollbar.horizontal.slider:hover, .scrollbar.horizontal.slider.dragging { margintop: 0px; } undershoot.top, undershoot.right, undershoot.bottom, undershoot.left { backgroundimage: none; } undershoot.top, undershoot.right, undershoot.bottom, undershoot.left { backgroundimage: none; } To make the scrollbars a bit easier to see in QT5 applications, run qt5ct (after aptgetting it), and add in a new style sheet, /usr/share/qt5ct/qss/scrollbarsimplebackup.qss /* SCROLLBARS (NOTE: Changing 1 subcontrol means you have to change all of them)*/ QScrollBar{ background: palette(alternatebase); } QScrollBar:horizontal{ margin: 0px 0px 0px 0px; } QScrollBar:vertical{ margin: 0px 0px 0px 0px; } QScrollBar::handle{ background: #816891; border: 1px solid transparent; borderradius: 1px; } QScrollBar::handle:hover, QScrollBar::addline:hover, QScrollBar::subline:hover{ background: palette(highlight); } QScrollBar::addline{ subcontrolorigin: none; } QScrollBar::addline:vertical, QScrollBar::subline:vertical{ height: 0px; } QScrollBar::addline:horizontal, QScrollBar::subline:horizontal{ width: 0px; } QScrollBar::subline{ subcontrolorigin: none; }  
{1548} 
ref: 2021
tags: gated multi layer perceptrons transformers ML Quoc_Le Google_Brain
date: 08052021 06:00 gmt
revision:4
[3] [2] [1] [0] [head]


Pretty remarkable that an industrial lab freely publishes results like this. I guess the ROI is that they get the resultant improved ideas? Or, perhaps, Google is in such a dominant position in terms of data and compute that even if they give away ideas and code, provided some of the resultant innovation returns to them, they win. The return includes trained people as well as ideas. Good for us, I guess!  
{1547}  
MetaLearning Update Rules for Unsupervised Representation Learning
This is a clearlywritten, easy to understand paper. The results are not highly compelling, but as a first set of experiments, it's successful enough. I wonder what more constraints (fewer parameters, per the genome), more options for architecture modifications (e.g. different feedback schemes, per neurobiology), and a blackbox optimization algorithm (evolution) would do?  
{1449}  
This was compiled from searching papers which referenced Olshausen and Field 1996 PMID8637596 Emergence of simplecell receptive field properties by learning a sparse code for natural images.
 
{1546}  
Local synaptic learning rules suffice to maximize mutual information in a linear network
x = randn(1000, 10); Q = x' * x; a = 0.001; Y = randn(10, 1); y = zeros(10, 1); for i = 1:1000 y = Y + (eye(10)  a*Q)*y; end y  pinv(Q)*Y / a % should be zero.
To this is added a 'sensing' learning and 'noise' unlearning phase  one optimizes $H(Z)$ , the other minimizes $H(ZS)$ . Everything is then applied, similar to before, to a gaussianfiltered onedimensional whitenoise stimuli. He shows this results in bandpass filter behavior  quite weak sauce in an era where ML papers are expected to test on five or so datasets. Even if this was 1992 (nearly forty years ago!), it would have been nice to see this applied to a more realistic dataset; perhaps some of the following papers? Olshausen & Field came out in 1996  but they applied their algorithm to real images. In both Olshausen & this work, no affordances are made for multiple layers. There have to be solutions out there...  
{1545}  
Selforganizaton in a perceptual network
One may critically challenge the infomax idea: we very much need to (and do) throw away spurious or irrelevant information in our sensory streams; what upper layers 'care about' when making decisions is certainly relevant to the lower layers. This creditassignment is neatly solved by backprop, and there are a number 'biologically plausible' means of performing it, but both this and infomax are maybe avoiding the problem. What might the upper layers really care about? Likely 'care about' is an emergent property of the interacting local learning rules and network structure. Can you search directly in these domains, within biological limits, and motivated by statistical reality, to find unsupervisedlearning networks? You'll still need a way to rank the networks, hence an objective 'care about' function. Sigh. Either way, I don't per se put a lot of weight in the infomax principle. It could be useful, but is only part of the story. Otherwise Linsker's discussion is accessible, lucid, and prescient. Lol.  
{1544}  
The HSIC Bottleneck: Deep learning without Backpropagation In this work, the authors use a kernelized estimate of statistical independence as part of a 'information bottleneck' to set perlayer objective functions for learning useful features in a deep network. They use the HSIC, or Hilbertschmidt independence criterion, as the independence measure. The information bottleneck was proposed by Bailek (spikes..) et al in 1999, and aims to increase the mutual information between the output and the labels while minimizing the mutual information between the output and the labels: $\frac{min}{P_{T_i}  X)} I(X; T_i)  \Beta I(T_i; Y)$ Where $T_i$ is the hidden representation at layer i (later output), $X$ is the layer input, and $Y$ are the labels. By replacing $I()$ with the HSIC, and some derivation (?), they show that $HSIC(D) = (m1)^{2} tr(K_X H K_y H)$ Where $D = {(x_1,y_1), ... (x_m, y_m)}$ are samples and labels, $K_{X_{ij}} = k(x_i, x_j)$ and $K_{Y_{ij}} = k(y_i, y_j)$  that is, it's the kernel function applied to all pairs of (vectoral) input variables. H is the centering matrix. The kernel is simply a Gaussian kernel, $k(x,y) = exp(1/2 xy^2/\sigma^2)$ . So, if all the x and y are on average independent, then the innerproduct will be mean zero, the kernel will be mean one, and after centering will lead to zero trace. If the inner product is large within the realm of the derivative of the kernel, then the HSIC will be large (and negative, i think). In practice they use three different widths for their kernel, and they also center the kernel matrices. But still, the feedback is an aggregate measure (the trace) of the product of two kernelized (a nonlinearity) outerproduct spaces of similarities between inputs. it's not unimaginable that feedback networks could be doing something like this... For example, a neural network could calculate & communicate aspects of joint statistics to reward / penalize weights within a layer of a network, and this is parallelizable / per layer / adaptable to an unsupervised learning regime. Indeed, that was done almost exactly by this paper: Kernelized information bottleneck leads to biologically plausible 3factor Hebbian learning in deep networks albeit in a much less intelligable way.  
{1543} 
ref: 2019
tags: backprop neural networks deep learning coordinate descent alternating minimization
date: 07212021 03:07 gmt
revision:1
[0] [head]


Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
This is interesting in that the weight updates can be cone in parallel  perhaps more efficient  but you are still propagating errors backward, albeit via optimizing 'codes'. Given the vast infractructure devoted to autodiff + backprop, I can't see this being adopted broadly. That said, the idea of alternating minimization (which is used eg for EM clustering) is powerful, and this paper does describe (though I didn't read it) how there are guarantees on the convexity of the alternating minimization. Likewise, the authors show how to improve the performance of the online / minibatch algorithm by keeping around memory variables, in the form of covariance matrices.  
{1542}  
https://github.com/wilicc/gpuburn Multgpu stress test. Are your GPUs overclocked to the point of overheating / being unreliable?  
{1541}  
Like this blog but 100% better!  
{1540}  
Two Routes to Scalable Credit Assignment without Weight Symmetry This paper looks at five different learning rules, three purely local, and two nonlocal, to see if they can work as well as backprop in training a deep convolutional net on ImageNet. The local learning networks all feature forward weights W and backward weights B; the forward weights (+ nonlinearities) pass the information to lead to a classification; the backward weights pass the error, which is used to locally adjust the forward weights. Hence, each fake neuron has locally the forward activation, the backward error (or loss gradient), the forward weight, backward weight, and Hebbian terms thereof (e.g the outer product of the inout vectors for both forward and backward passes). From these available variables, they construct the local learning rules:
Each of these serves as a "regularizer term" on the feedback weights, which governs their learning dynamics. In the case of backprop, the backward weights B are just the instantaneous transpose of the forward weights W. A good local learning rule approximates this transpose progressively. They show that, with proper hyperparameter setting, this does indeed work nearly as well as backprop when training a ResNet18 network. But, hyperparameter settings don't translate to other network topologies. To allow this, they add in nonlocal learning rules:
In "Symmetric Alignment", the Self and Decay rules are employed. This is similar to backprop (the backward weights will track the forward ones) with L2 regularization, which is not new. It performs very similarly to backprop. In "Activation Alignment", Amp and Sparse rules are employed. I assume this is supposed to be more biologically plausible  the Hebbian term can track the forward weights, while the Sparse rule regularizes and stabilizes the learning, such that overall dynamics allow the gradient to flow even if W and B aren't transposes of each other. Surprisingly, they find that Symmetric Alignment to be more robust to the injection of Gaussian noise during training than backprop. Both SA and AA achieve similar accuracies on the ResNet benchmark. The authors then go on to explain the plausibility of nonlocal but approximate learning rules with Regression discontinuity design ala Spiking allows neurons to estimate their causal effect. This is a decent paper,reasonably well written. They thought trough what variables are available to affect learning, and parameterized five combinations that work. Could they have done the full matrix of combinations, optimizing just they same as the metaparameters? Perhaps, but that would be even more work ... Regarding the desire to reconcile backprop and biology, this paper does not bring us much (if at all) closer. Biological neural networks have specific and local uses for error; even invoking 'error' has limited explanatory power on activity. Learning and firing dynamics, of course of course. Is the brain then just an overbearing mess of details and overlapping rules? Yes probably but that doesn't mean that we human's can't find something simpler that works. The algorithms in this paper, for example, are well described by a bit of linear algebra, and yet they are performant.  
{1539}  
https://webautocats.com/epc/saab/sbd/  Online, free parts lookup for Saab cars. Useful.  
{1538}  
PMID20596024 Sensitivity to perturbations in vivo implies high noise and suggests rate coding in cortex
Cortical reliability amid noise and chaos
 
{1537} 
ref: 0
tags: cortical computation learning predictive coding reviews
date: 02232021 20:15 gmt
revision:2
[1] [0] [head]


PMID30359606 Predictive Processing: A Canonical Cortical Computation
PMID23177956 Canonical microcircuits for predictive coding
Control of synaptic plasticity in deep cortical networks
 
{1536}  
From Protein Structure to Function with Bioinformatics
 
{1532}  
PMID23273272 A cellular mechanism for cortical associations: and organizing principle for the cerebral cortex
See also: PMID25174710 Sensoryevoked LTP driven by dendritic plateau potentials in vivo
And: The binding solution?, a blog post covering Bittner 2015 that looks at rapid dendritic plasticity in the hippocampus as a means of binding stimuli to place fields.  
{1523} 
ref: 0
tags: tennenbaum compositional learning character recognition oneshot learning
date: 02232021 18:56 gmt
revision:2
[1] [0] [head]


Oneshot learning by inverting a compositional causal process
 
{1526} 
ref: 0
tags: neuronal assemblies maass hebbian plasticity simulation austria fMRI
date: 02232021 18:49 gmt
revision:1
[0] [head]


PMID32381648 A model for structured information representation in neural networks in the brain
 
{1535}  
Reconciling modern machinelearning practice and the classical bias–variance tradeoff A formal publication of the effect famously discovered at OpenAI & publicized on their blog. Goes into some details on fourier features & runs experiments to verify the OpenAI findings. The result stands. An interesting avenue of research is using genetic algorithms to perform the search over neural network parameters (instead of backprop) in reinforcementlearning tasks. Ben Phillips has a blog post on some of the recent results, which show that it does work for certain 'hard' problems in RL. Of course, this is the dual of the 'lottery ticket' hypothesis and the deep double descent, above; large networks are likely to have solutions 'close enough' to solve a given problem. That said, genetic algorithms don't necessarily perform gradient descent to tweak the weights for optimal behaviror once they are within the right region of RL behavior. See {1530} for more discussion on this topic, as well as {1525} for a more complete literature survey.  
{1534}  
Going in circles is the way forward: the role of recurrence in visual inference I think the best part of this article are the references  a nicely complete listing of, well, the current opinion in Neurobiology! (Note that this issue is edited by our own Karel Svoboda, hence there are a good number of Janelians in the author list..) The gestalt of the review is that deep neural networks need to be recurrent, not purely feedforward. This results in savings in overall network size, and increase in the achievable computational complexity, perhaps via the incorporation of priors and temporalspatial information. All this again makes perfect sense and matches my sense of prevailing opinion. Of course, we are left wanting more: all this recurrence ought to be structured in some way. To me, a rather naive way of thinking about it is that feedforward layers cause weak activations, which are 'amplified' or 'selected for' in downstream neurons. These neurons proximally code for 'causes' or local reasons, based on the supported hypothesis that the brain has a good temporalspatial model of the visuomotor world. The causes then can either explain away the visual input, leading to balanced EI, or fail to explain it, in which the excess activity is either rectified by engaging more circuits or engaging synaptic plasticity. A critical part of this hypothesis is some degree of binding / disentanglement / spatiotemporal reassignment. While not all models of computation require registers / variables  RNNs are Turningcomplete, e.g., I remain stuck on the idea that, to explain phenomenological experience and practical cognition, the brain much have some means of 'binding'. A reasonable place to look is the apical tuft dendrites, which are capable of storing temporary state (calcium spikes, NMDA spikes), undergo rapid synaptic plasticity, and are so dense that they can reasonably store the outerproduct space of binding. There is mounting evidence for apical tufts working independently / in parallel is investigations of highgamma in ECoG: PMID32851172 Dissociation of broadband highfrequency activity and neuronal firing in the neocortex. "High gamma" shows little correlation with MUA when you differentiate earlydeep and latesuperficial responses, "consistent with the view it reflects dendritic processing separable from local neuronal firing" 