m8ta
You are not authenticated, login. |
|
{1578} | ||||||||||||||||||||
God Help us, let's try to understand AI monosemanticity Commentary: To some degree, superposition seems like a geometric "hack" invented in the process of optimization to squeeze a great many (largely mutually-exclusive) sparse features into a limited number of neurons. GPT3 has a latent dimension of only 96 * 128 = 12288, and with 96 layers this is only 1.17 M neurons (*). A fruit fly has 100k neurons (and can't speak). All communication must be through that 12288 dimensional vector, which is passed through LayerNorm many times (**), so naturally the network learns to take advantage of locally linear subspaces. That said, the primate visual system does seem to use superposition, though not via local subspaces; instead, neurons seem to encode multiple axes somewhat linearly (e.g. global spaces: linearly combined position and class) That was a few years ago, and I suspect that new results may contest this. The face area seems to do a good job of disentanglement, for example. Treating everything as high-dimensional vectors is great for analogy making, like the wife - husband + king = queen example. But having fixed-size vectors for representing arbitrary-dimensioned relationships inevitably leads to compression ~= superposition. Provided those subspaces are semantically meaningful, it all works out from a generalization standpoint -- but this is then equivalent to allocating an additional axis for said relationship or attribute. Additional axes would also put less decoding burden on the downstream layers, and make optimization easier. Google has demonstrated allocation in transformers. It's also prevalent in the cortex. Trick is getting it to work! (*) GPT4 is unlikely to have more than an order of magnitude more 'neurons'; PaLM-540B has only 2.17 M. Given that GPT-4 is something like 3-4x larger, it should have 6-8 M neurons, which is still 3 orders of magnitude fewer than the human neocortex (nevermind the cerebellum ;-) (**) I'm of two minds on LayerNorm. PV interneurons might be seen to do something like this, but it's all local -- you don't need everything to be vector rotations. (LayerNorm effectively removes one degree of freedom, so really it's a 12287 dimensional vector) Update: After reading https://transformer-circuits.pub/2023/monosemantic-features/index.html, I find the idea of local manifolds / local codes to be quite appealing: why not represent sparse yet conditional features using superposition? This also expands the possibility of pseudo-hierarchical representation, which is great. | ||||||||||||||||||||
{1577} | ||||||||||||||||||||
Sketch - Program synthesis by sketching
The essential algorithm, in words: Take the sketch, expand it to a set of parameterized variables, holes, and calling contexts. Convert these to a DAG aka (?) data-code flow graph w/ dependencies. Try to simplify the DAG, one-hot encode integers, and convert to either a conjunctive-normal-form (CNF) SAT problem for MiniSat, or to a boolean circuit for the ABC solver. Apply MiniSat or ABC to the problem to select a set of control values = values for the holes & permutations that satisfy the boolean constraints. Using this solution, use the SAT solver to find a input variable configuration that does not satisfy the problem. This serves as a counter-example. Run this through the validator function (oracle) to see what it does; use the counter-example and (inputs and outputs) to add clauses to the SAT problem. Run several times until either no counter-examples can be found or the problem is `unsat`. Though the thesis describes a system that was academic & relatively small back in 2008, Sketch has enjoyed continuous development, and remains used. I find the work that went into it to be remarkable and impressive -- even with incremental improvements, you need accurate expansion of the language & manipulations to show proof-of-principle. Left wondering what limits its application to even larger problems -- need for a higher-level loop that further subdivides / factorizes the problem, or DFS for filling out elements of the sketch? Interesting links discovered in while reading the dissertation:
| ||||||||||||||||||||
{1511} | ||||||||||||||||||||
| ||||||||||||||||||||
{1468} |
ref: -2013
tags: microscopy space bandwidth product imaging resolution UCSF
date: 06-17-2019 14:45 gmt
revision:0
[head]
|
|||||||||||||||||||
How much information does your microscope transmit?
| ||||||||||||||||||||
{1454} | ||||||||||||||||||||
Building High-level Features Using Large Scale Unsupervised Learning
| ||||||||||||||||||||
{1426} | ||||||||||||||||||||
Training neural networks with local error signals
| ||||||||||||||||||||
{1432} | ||||||||||||||||||||
Direct Feedback alignment provides learning in deep neural nets
| ||||||||||||||||||||
{1423} | ||||||||||||||||||||
PMID-27824044 Random synaptic feedback weights support error backpropagation for deep learning.
Our proof says that weights W0 and W evolve to equilibrium manifolds, but simulations (Fig. 4) and analytic results (Supple- mentary Proof 2) hint at something more specific: that when the weights begin near 0, feedback alignment encourages W to act like a local pseudoinverse of B around the error manifold. This fact is important because if B were exactly W + (the Moore- Penrose pseudoinverse of W ), then the network would be performing Gauss-Newton optimization (Supplementary Proof 3). We call this update rule for the hidden units pseudobackprop and denote it by ∆hPBP = W + e. Experiments with the linear net- work show that the angle, ∆hFA ]∆hPBP quickly becomes smaller than ∆hFA ]∆hBP (Fig. 4b, c; see Methods). In other words feedback alignment, despite its simplicity, displays elements of second-order learning. | ||||||||||||||||||||
{1431} | ||||||||||||||||||||
Sparse and composite coherent lattices
| ||||||||||||||||||||
{1391} | ||||||||||||||||||||
Evolutionary Plasticity and Innovations in Complex Metabolic Reaction Networks
Summary thoughts: This is a highly interesting study, insofar that the authors show substantial support for their hypotheses that phenotypes can be explored through random-walk non-lethal mutations of the genotype, and this is somewhat invariant to the source of carbon for known biochemical reactions. What gives me pause is the use of linear programming / optimization when setting the relative concentrations of biomolecules, and the permissive criteria for accepting these networks; real life (I would imagine) is far more constrained. Relative and absolute concentrations matter. Still, the study does reflect some robustness. I suggest that a good control would be to ‘fuzz’ the list of available reactions based on statistical criteria, and see if the results still hold. Then, go back and make the reactions un-biological or less networked, and see if this destroys the measured degrees of robustness. | ||||||||||||||||||||
{1335} | ||||||||||||||||||||
What are the concentrations of the monoamines in the brain? (Purpose: estimate the required electrochemical sensing area & efficiency)
| ||||||||||||||||||||
{1318} | ||||||||||||||||||||
{969} | ||||||||||||||||||||
PMID-19460368[0] Pathological subthalamic nucleus oscillations in PD: can they be the cause of bradykinesia and akinesia?
____References____
| ||||||||||||||||||||
{1125} |
ref: -0
tags: active filter design Netherlands Gerrit Groenewold
date: 02-17-2012 20:27 gmt
revision:0
[head]
|
|||||||||||||||||||
IEEE-04268406 (pdf) Noise and Group Delay in Actvie Filters
| ||||||||||||||||||||
{806} | ||||||||||||||||||||
I've recently tried to determine the bit-rate of conveyed by one gaussian random process about another in terms of the signal-to-noise ratio between the two. Assume is the known signal to be predicted, and is the prediction. Let's define where . Note this is a ratio of powers; for the conventional SNR, . is also known as the mean-squared-error (mse). Now, ; assume x and y have unit variance (or scale them so that they do), then
We need the covariance because the mutual information between two jointly Gaussian zero-mean variables can be defined in terms of their covariance matrix: (see http://www.springerlink.com/content/v026617150753x6q/ ). Here Q is the covariance matrix,
Then or This agrees with intuition. If we have a SNR of 10db, or 10 (power ratio), then we would expect to be able to break a random variable into about 10 different categories or bins (recall stdev is the sqrt of the variance), with the probability of the variable being in the estimated bin to be 1/2. (This, at least in my mind, is where the 1/2 constant comes from - if there is gaussian noise, you won't be able to determine exactly which bin the random variable is in, hence log_2 is an overestimator.) Here is a table with the respective values, including the amplitude (not power) ratio representations of SNR. "
Now, to get the bitrate, you take the SNR, calculate the mutual information, and multiply it by the bandwidth (not the sampling rate in a discrete time system) of the signals. In our particular application, I think the bandwidth is between 1 and 2 Hz, hence we're getting 1.6-3.2 bits/second/axis, hence 3.2-6.4 bits/second for our normal 2D tasks. If you read this blog regularly, you'll notice that others have achieved 4bits/sec with one neuron and 6.5 bits/sec with dozens {271}. | ||||||||||||||||||||
{316} | ||||||||||||||||||||
PMID-12797724[0] A miniaturized neuroprosthesis suitable for implantation into the brain.
____References____ | ||||||||||||||||||||
{322} | ||||||||||||||||||||
PMID-15247483[0] Cognitive control signals for Neural Prosthetics
PMID-15491902 Cognitive neural prosthetics
____References____ | ||||||||||||||||||||
{1004} | ||||||||||||||||||||
IEEE-1351853 (pdf) Development of integrated circuits for readout of microelectrode arrays to image neuronal activity in live retinal tissue
____References____ Dabrowski, W. and Grybos, P. and Hottowy, P. and Skoczen, A. and Swientek, K. and Bezayiff, N. and Grillo, A.A. and Kachiguine, S. and Litke, A.M. and Sher, A. Nuclear Science Symposium Conference Record, 2003 IEEE 2 956 - 960 Vol.2 (2003) | ||||||||||||||||||||
{984} | ||||||||||||||||||||
IEEE-6114258 (pdf) Towards a Brain-Machine-Brain Interface:Virtual Active Touch Using Randomly Patterned Intracortical Microstimulation.
____References____ O'Doherty, J. and Lebedev, M. and Li, Z. and Nicolelis, M. Towards a Brain #x2013;Machine #x2013;Brain Interface:Virtual Active Touch Using Randomly Patterned Intracortical Microstimulation Neural Systems and Rehabilitation Engineering, IEEE Transactions on PP 99 1 (2011) | ||||||||||||||||||||
{1002} |
ref: Fan-2011.01
tags: TBSI wireless recordings system FM modulation multiplexing poland
date: 01-03-2012 00:55 gmt
revision:5
[4] [3] [2] [1] [0] [head]
|
|||||||||||||||||||
PMID-21765934[0] A wireless multi-channel recording system for freely behaving mice and rats.
____References____
| ||||||||||||||||||||
{968} |
ref: Bassett-2009.07
tags: Weinberger congnitive efficiency beta band neuroimagaing EEG task performance optimization network size effort
date: 12-28-2011 20:39 gmt
revision:1
[0] [head]
|
|||||||||||||||||||
PMID-19564605[0] Cognitive fitness of cost-efficient brain functional networks.
____References____
| ||||||||||||||||||||
{69} | ||||||||||||||||||||
PMID-17057705 Long-term motor cortex plasticity induced by an electronic neural implant.
____References____ | ||||||||||||||||||||
{300} | ||||||||||||||||||||
Motor learning by field approximation.
____References____ | ||||||||||||||||||||
{474} | ||||||||||||||||||||
http://delsys.com/KnowledgeCenter/FAQ_EMGSensor.html
| ||||||||||||||||||||
{698} | ||||||||||||||||||||
From Scott MacKenzie:
| ||||||||||||||||||||
{914} | ||||||||||||||||||||
PMID-10681435 Cortical correlates of learning in monkey adapting to a new dynamical environment. | ||||||||||||||||||||
{886} | ||||||||||||||||||||
Just got back from a trek through the volcanic mountains of Iceland. The landscape is extremely dramatic; though it’s not nearly the scale of Alaska or the Rockies, it presents itself as such, as the largest plant is thick moss or stubble grass (in places); everything is bare, the vistas unobstructed. (What do you do if you get lost in an Icelandic forest? Stand up.). There are no trees for size reference, indeed it seemed so alien for a bit that I was amazed that I could still breathe the air. The first day of exploring I had a pretty serious scare. Was walking, very light and fast as usual, with just enough to protect against rain, just enough food to keep me from eating moss. I elected to take the less-popular route back, which lead across a high muddy (no plants) gray (all the snow is ashen) scree-filled plain, to a hunchback of a mountain, and down into the river valley where I was camped. The first part was fine, though searingly desolate and wind-shorn. The problem came when I rounded the final peak and discovered that the trail was covered by a gray wind-sculpted snowmass. It was at an angle too steep for my shit shoes and lack of ice-tools, and the slopes everywhere else were critical: free a rock and it will tumble 100‘. Free a Tim and he will also tumble 100 feet .. or more. I didn’t want to hike the 17km back the way I came without an attempt at re-finding the trail, so I set off, gingerly, over the ice and gravel, alone. The ash actually saved me, as it coated the snowfields, and made them passable in the late late afternoon warmth (the sun ‘sets’ around midnight and rises at 2.). This lead to a pinnacle from which I could *see* the campsite! But there was only slide-to-death venues for descent, until I noticed a set of footprints heading up a steep snowbank to my left. I was elated - a trace of humanity! I set off with renewed vigor, and did a semi-controlled fall down the ice; the foot-holes kept me under control. But they were not foot holes. I noticed quickly that the holes were irregular in spacing and shape, and shortly after I passed the steepest wind sculpted section of snowbank, realized that they were made by a large rock falling off the mountain, picking up speed as it dented the ice shell. I kept going, mostly because I could not stop, though eventually it leveled off. Had that rock not fallen, I don’t think I would have had the psychological wherewithal to try the slope, nevermind foot purchase to slow my descent. As a stream gets broader its slope generally decreases, given constant resistance from the rock / earth, so as I descended the valleys broadened and became less treacherous. I made the remainder of the way back on a riverbed, albeit with wet feet. It was exciting, and i felt fully in the world as i was trying to get off that trail-less mountain, but I’m not sure if I want to do it again; the following day while hiking up neighboring peaks I felt a heightened sense of caution, vertigo. | ||||||||||||||||||||
{882} | ||||||||||||||||||||
| ||||||||||||||||||||
{872} | ||||||||||||||||||||
Excellent hike in Bynum NC starting at the old homestead down there, crossed a number of random properties, entered and left Haw river state park, saw a good number of decomposing farmhouses, all on a gorgeous day. Route was taken clockwise; jog at the end away from main trail was to avoid a hunter in the main fields. This forced us to do a good bit of bushwacking and gave the opportunity to meet some local horses, goats, and runners. Total distance about 9 miles.
| ||||||||||||||||||||
{810} |
ref: -0
tags: circular polarized antenna microstrip ultrawideband
date: 02-03-2010 21:30 gmt
revision:1
[0] [head]
|
|||||||||||||||||||
excellent! Ultra-wideband circular polarized microstrip archimedean spiral | ||||||||||||||||||||
{805} | ||||||||||||||||||||
http://silentlistening.wordpress.com/2008/05/09/dispersion-of-sound-waves-in-ice-sheets/ -- amazing! | ||||||||||||||||||||
{783} | ||||||||||||||||||||
PMID-19435684[0] A 128-channel 6 mW wireless neural recording IC with spike feature extraction and UWB transmitter.
____References____
| ||||||||||||||||||||
{734} | ||||||||||||||||||||
Rethinking the American Dream by David Kamp
| ||||||||||||||||||||
{178} |
ref: Churchland-2006.12
tags: motor_noise CNS Churchland execution variance motor_planning 2006
date: 12-08-2008 22:50 gmt
revision:2
[1] [0] [head]
|
|||||||||||||||||||
PMID-17178410[0] A central source of movement variability.
____References____ | ||||||||||||||||||||
{590} | ||||||||||||||||||||
It is not obvious how to run an external command in ocaml & get it's output from stdin. Here is my hack, which simply polls the output of the program until there is nothing left to read. Not very highly tested, but I wanted to share, as I don't think there is an example of the same on pleac let run_command cmd = let inch = Unix.open_process_in cmd in let infd = Unix.descr_of_in_channel inch in let buf = String.create 20000 in let il = ref 1 in let offset = ref 0 in while !il > 0 do ( let inlen = Unix.read infd buf !offset (20000- !offset) in il := inlen ; offset := !offset + inlen; ) done; ignore(Unix.close_process_in inch); if !offset = 0 then "" else String.sub buf 0 !offset ;; Note: Fixed a nasty string-termination/memory-reuse bug Sept 10 2008 | ||||||||||||||||||||
{581} | ||||||||||||||||||||
brilliant!! source: android winners | ||||||||||||||||||||
{521} | ||||||||||||||||||||
Above, FCC limitations on UWB transmitted power levels in communication devices. Currently, only the US allows operation of UWB transceivers. links:
| ||||||||||||||||||||
{503} | ||||||||||||||||||||
quote: Consumers also pay high taxes for telecommunication services, averaging about 13 percent on some telecom services, similar to the tax rate on tobacco and alcohol, Mehlman said. One tax on telecom service has remained in place since the 1898 Spanish-American War, when few U.S. residents had telephones, he noted. "We think it's a mistake to treat telecom like a luxury and tax it like a sin," he said. from: The internet could run out of capacity in two years comments:
| ||||||||||||||||||||
{480} |
ref: bookmark-0
tags: RonPaul American presidential candidate libertarian
date: 10-30-2007 22:38 gmt
revision:0
[head]
|
|||||||||||||||||||
http://www.grist.org/feature/2007/10/16/paul/?source=weekly
| ||||||||||||||||||||
{476} | ||||||||||||||||||||
{403} |
ref: bookmark-0
tags: blackfin ELF freestanding applications boot
date: 08-01-2007 14:40 gmt
revision:0
[head]
|
|||||||||||||||||||
http://www.johanforrer.net/BLACKFIN/index.html very good, very instructive. | ||||||||||||||||||||
{147} | ||||||||||||||||||||
PMID-12899253 Boosting bit rates and error detection for the classification of fast-paced motor commands based on single-trial EEG analysis
| ||||||||||||||||||||
{75} | ||||||||||||||||||||
{72} |
ref: abstract-0
tags: tlh24 error signals in the cortex and basal ganglia reinforcement_learning gradient_descent motor_learning
date: 0-0-2006 0:0
revision:0
[head]
|
|||||||||||||||||||
Title: Error signals in the cortex and basal ganglia. Abstract: Numerous studies have found correlations between measures of neural activity, from single unit recordings to aggregate measures such as EEG, to motor behavior. Two general themes have emerged from this research: neurons are generally broadly tuned and are often arrayed in spatial maps. It is hypothesized that these are two features of a larger hierarchal structure of spatial and temporal transforms that allow mappings to procure complex behaviors from abstract goals, or similarly, complex sensory information to produce simple percepts. Much theoretical work has proved the suitability of this organization to both generate behavior and extract relevant information from the world. It is generally agreed that most transforms enacted by the cortex and basal ganglia are learned rather than genetically encoded. Therefore, it is the characterization of the learning process that describes the computational nature of the brain; the descriptions of the basis functions themselves are more descriptive of the brain’s environment. Here we hypothesize that learning in the mammalian brain is a stochastic maximization of reward and transform predictability, and a minimization of transform complexity and latency. It is probable that the optimizations employed in learning include both components of gradient descent and competitive elimination, which are two large classes of algorithms explored extensively in the field of machine learning. The former method requires the existence of a vectoral error signal, while the latter is less restrictive, and requires at least a scalar evaluator. We will look for the existence of candidate error or evaluator signals in the cortex and basal ganglia during force-field learning where the motor error is task-relevant and explicitly provided to the subject. By simultaneously recording large populations of neurons from multiple brain areas we can probe the existence of error or evaluator signals by measuring the stochastic relationship and predictive ability of neural activity to the provided error signal. From this data we will also be able to track dependence of neural tuning trajectory on trial-by-trial success; if the cortex operates under minimization principles, then tuning change will have a temporal relationship to reward. The overarching goal of this research is to look for one aspect of motor learning – the error signal – with the hope of using this data to better understand the normal function of the cortex and basal ganglia, and how this normal function is related to the symptoms caused by disease and lesions of the brain. |