m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{517} is owned by tlh24.
{1571}
hide / / print
ref: -2022 tags: language learning symbolic regression Fleet meta search date: 06-04-2022 02:28 gmt revision:4 [3] [2] [1] [0] [head]

One model for the learning of language

  • Yuan Yang and Steven T. Piantadosi
  • Idea: Given a restricted compositional 'mentalese' programming language / substrate, construct a set of grammatical rules ('hypotheses') from a small number of examples of an (abstract) language.
    • Pinker's argument that there is too little stimulus ("paucity of stimulus") for children discern grammatical rules, hence they must be innate, is thereby refuted..
      • This is not the only refutation.
      • An argument was made on Twitter that large language models also refute the paucity of stimuli hypothesis. Meh, this paper does it far better -- the data used to train transformers is hardly small.
  • Hypotheses are sampled from the substrate using MCMC, and selected based on a smoothed Bayesian likelihood.
    • This likelihood takes into account partial hits -- results that are within an edit distance of one of the desired sets of strings. (i think)
  • They use Parallel tempering to search the space of programs.
    • Roughly: keep alive many different hypotheses, and vary the temperatures of each lineage to avoid getting stuck in local minima.
    • But there are other search heuristics; see https://codedocs.xyz/piantado/Fleet/
  • Excecution is on the CPU, across multiple cores / threads, possibly across multiple servers.
  • Larger hypotheses took up to 7 days to find (!)
    • These aren't that complicated of grammars..

  • This is very similar to {842}, only on grammars rather than continuous signals from MoCap.
  • Proves once again that:
    1. Many domains of the world can be adequately described by relatively simple computational structures (It's a low-D, compositional world out there)
      1. Or, the Johnson-Lindenstrauss lemma
    2. You can find those hypotheses through brute-force + heuristic search. (At least to the point that you run into the curse of dimensionality)

A more interesting result is Deep symbolic regression for recurrent sequences, where the authors (facebook/meta) use a Transformer -- in this case, directly taken from Vaswini 2017 (8-head, 8-layer QKV w/ a latent dimension of 512) to do both symbolic (estimate the algebraic recurrence relation) and numeric (estimate the rest of the sequence) training / evaluation. Symbolic regression generalizes better, unsurprisingly. But both can be made to work even in the presence of (log-scaled) noise!

While the language learning paper shows that small generative programs can be inferred from a few samples, the Meta symbolic regression shows that Transformers can evince either amortized memory (less likely) or algorithms for perception -- both new and interesting. It suggests that 'even' abstract symbolic learning tasks are sufficiently decomposable that the sorts of algorithms available to an 8-layer transformer can give a useful search heuristic. (N.B. That the transformer doesn't spit out perfect symbolic or numerical results directly -- it also needs post-processing search. Also, the transformer algorithm has search (in the form of softmax) baked in to it's architecture.)

This is not a light architecture: they trained the transformer for 250 epochs, where each epoch was 5M equations in batches of 512. Each epoch took 1 hour on 16 Volta GPUs w 32GB of memory. So, 4k GPU-hours x ~10 TFlops = 1.4e20 Flops. Compare this with grammar learning above; 7 days on 32 cores operating at ~ 3Gops/sec is 1.8e15 ops. Much, much smaller compute.

All of this is to suggest a central theme of computer science: a continuum between search and memorization.

  • The language paper does fast search, but does not learn from the process (bootstrap), and maintains little state/memory.
  • The symbolic regression paper does moderate amounts of search, but continually learns form the process, and stores a great deal of heuristics for the problem domain.

Most interesting for a visual neuroscientist (not that I'm one per se, but bear with me) is where on these axes (search, heuristic, memory) visual perception is. Clearly there is a high degree of recurrence, and a high degree of plasticity / learning. But is there search or local optimization? Is this coupled to the recurrence via some form of energy-minimizing system? Is recurrence approximating E-M?

{1506}
hide / / print
ref: -0 tags: asymmetric locality sensitive hash maximum inner product search sparsity date: 03-30-2020 02:17 gmt revision:5 [4] [3] [2] [1] [0] [head]

Improved asymmetric locality sensitive hashing for maximum inner product search

  • Like many other papers, this one is based on a long lineage of locality-sensitive hashing papers.
  • Key innovation, in [23] The power of asymmetry in binary hashing, was the development of asymmetric hashing -- the hash function of the query is different than the hash function used for storage. Roughly, this allows additional degrees of freedom since the similarity-function is (in the non-normalized case) non-symmetric.
    • For example, take query Q = [1 1] with keys A = [1 -1] and B = [3 3]. The nearest neighbor is A (distance 2), whereas the maximum inner product is B (inner product 6).
    • Alternately: self-inner product for Q and A is 2, whereas for B it's 18. Self-similarity is not the highest with inner products.
    • Norm of the query does not have an effect on the arg max of the search, though. Hence, for the paper assume that the query has been normalized for MIPS.
  • In this paper instead they convert MIPS into approximate cosine similarity search (which is like normalized MIPS), which can be efficiently solved with signed random projections.
  • (Established): LSH-L2 distance:
    • Sample a random vector a, iid normal N(0,1)
    • Sample a random normal b between 0 and r
      • r is the window size / radius (free parameters?)
    • Hash function is then the floor of the inner product of the vector a and input x + b divided by the radius.
      • I'm not sure about how the floor op is converted to bits of the actual hash -- ?
  • (Established): LSH-correlation, signed random projections h signh^{sign} :
    • Hash is the sign of the inner product of the input vector and a uniform random vector a.
    • This is a two-bit random projection [13][14].
  • (New) Asymmetric-LSH-L2:
    • P(x)=[x;||x|| 2 2;||x|| 2 4;....;||x|| 2 2 m]P(x) = [x;||x||^2_2; ||x||^4_2; .... ; ||x||^{2^m}_2] -- this is the pre-processing hashing of the 'keys'.
      • Requires that then norm of these keys, {||x||}_2 < U < 1$$
      • m3 m \geq 3
    • Q(x)=[x;1/2;1/2;...;1/2]Q(x) = [x;1/2; 1/2; ... ; 1/2] -- hashing of the queries.
    • See the mathematical explanation in the paper, but roughly "transformations P and Q, when normas are less than 1, provide correction to the L2 distance ||Q(p)P(x i)|| 2||Q(p) - P(x_i)||_2 , making in rank correlate with un-normalized inner product."
  • They then change the augmentation to:
    • P(x)=[x;1/2||x|| 2 2;1/2||x|| 2 4;...;1/2||x|| 2 2 m]P(x) = [x; 1/2 - ||x||^2_2; 1/2 - ||x||^4_2; ... ; 1/2 - ||x||^{2^m}_2]
    • Q(x)=[x;0;...;0]Q(x) = [x; 0; ...; 0]
    • This allows use of signed nearest-neighbor search to be used in the MIPS problem. (e.g. the hash is the sign of P and Q, per above; I assume this is still a 2-bit operation?)
  • Then the expand the U,M compromise function ρ\rho to allow for non-normalized queries. U depends on m and c (m is the codeword extension, and c is the ratio between o-target and off-target hash hits.
  • Tested on Movielens and Netflix databases, this using SVD preprocessing on the user-item matrix (full rank matrix indicating every user rating on every movie (mostly zeros!)) to get at the latent vectors.
  • In the above plots, recall (hah) that precision is the number of true positives / number of false positives as the number of draws k increases; recall is the number of true positives / number of draws k.
    • Clearly, the curve bends up and to the right when there are a lot of hash tables K.
    • Example datapoint: 50% precision at 40% recall, top 5. So on average you get 2 correct hits in 4 draws. Or: 40% precision, 20% recall, top 10: 2 hits in 5 draws. 20/40: 4 hits in 20 draws. (hit: correctly within the top-N)
    • So ... it's not that great.

Use case: Capsule: a camera based positioning system using learning
  • Uses 512 SIFT features as keys and queries to LSH. Hashing is computed via sparse addition / subtraction algorithm, with K bits per hash table (not quite random projections) and L hash tables. K = 22 and L = 24. ~ 1000 training images.
  • Best matching image is used as the location of the current image.

{1096}
hide / / print
ref: -3000 tags: DBS STN oscillations beta gamma research date: 02-21-2012 16:51 gmt revision:22 [21] [20] [19] [18] [17] [16] [head]

There seems to be an interesting connection between excessive grip force, isometric muscle contraction causing coherence between motor cortex and EMG, lack of inhibition on delayed response and go-no-go task, and experiments with STN lesioned rats, and the high/low oscillation hypothesis. Rather tenuous, I suppose, but let me spell it out. ( My personal impression, post-hoc, is that this is an epiphenomena of something else; evidence is contradictory.)

  1. PD patients, STN DBS impairs ability to match force characteristics to task requirements both in terms of grip force {88}, and when lifting heavy and light objects {88-2}. This is consistent with GPi function controlling the vigor or scaling of muscle responses
  2. Isometric force creation frequently engages the piper rhythm between cortex and muscles {1066}, which may be a means of preserving motor state {1066-4}.
  3. In PD patients there is marked increase in beta oscillation and synchronization {1064}, which decreases during movement {829}. Some suggest that it may be a non-coding resting state {969}, though beta-band energy is correlated with PD motor symptoms PMID-17005611, and STN DBS attenuates the power in the beta band {710-2},{753},{1073}, and DCS is likely to do the same PMID-21039949. Alternatively synchrony limits the ability to encode meaningful information. The beta band activity does not seem associated with rest tremor {1075}. Furthermore, beta band decreases prior and during movement, and with the administration of levodopa oscillation shifts to higher frequency -- the same frequency as the piper rhythm {1075}. Closed-loop stimulation with a delay (80ms) designed to null the beta oscillations is more effective than continuous high frequency DBS {967}.
  4. PD patients have deficits in inhibition on go-no-go and delayed response tasks that is exacerbated by STN DBS {1077-3}, as well as expedited decision making in conflict situations {1077} Lesioning the STN in rats has similar effect on delayed reward task performance, though it's somewhat more complicated. (and their basal ganglia is quite a bit different). {677}.
  5. The <30 Hz and >30Hz bands are inversely affected by both movement and dopamine treatment. {1069}

footnote: how much is our search for oscillations informed by our available analytical techniques?

Hypothesis: Impulsivity may be the cognitive equivalent of excess grip force; maintenance of consistent 'force' or delayed decision making benefits from Piper-band rhythms, something which PD abolishes (gradually, through brain adaptation). DBS disrupts the beta (resting, all synchronized) rhythm, and thereby permits movement. However it also effectively 'lesions' the STN, which leads to cognitive deficits and poor force control. (Wait .. DBS plus levodopa improves 40-60Hz energy -- this would argue against the hypothesis. Also, stroke in the STN in normal individuals causes hemiballismus, which resolves gradually; this is not consistent with oscillations, but rather connectivity and activity.)

Testing this hypothesis: well, first of all, is there beta-band oscillations in our data? what about piper band? We did not ask the patients to delay response, so any tests thereof will be implicit. Can look at relative energy 10Hz-30Hz and 30Hz-60Hz in the spike traces & see if this is modulated by hand position. (PETH as usual).

So. I made PETHs for beta / gamma power ratio of the spiking rate, controlled by shuffling the PETH triggers. Beta power was between 12 and 30 Hz; gamma between 30 and 75 Hz, as set by a noncausal IIR bandpass filter. The following is a non-normalized heatmap of all significant PETHs over all sessions triggered when the hand crossed the midpoint between targets. (A z-scored heatmap was made as well; it looked worse).

X is session number, Y time, 0 = -1 sec. sampling rate = 200 Hz. In one file (the band) there seems to be selective gamma inhibition about 0.5 sec before peak movement. Likely it is an outlier. 65 neurons of 973 (single and multiunits together) were significantly 'tuned' = 6.6%; marginally significant by binomial test (p=0.02). Below is an example PETH, with the shuffled distribution represented by mean +- 1 STD in blue.

The following heatmap is created from the significant PETHs triggered on target appearance.

80 of the 204 significant PETHs are from PLEX092606005_a. The total number of significant responses (204/1674, single units and multiunits) is significant by the binomial test p < 0.001, with and without Sept. 26 removed. Below is an example plot (092606005). Looks pretty damn good, actually.

Let's see how stable this relationship is by doing a leave-half out cross-validation, 10 plies, in red below (all triggers plotted in black)

Looks excellent! Problem is we are working with a ratio, which is prone to spikes. Fix: work in log space.

Aggregate response remains about the same. 192 / 1674 significant (11.5%)

In the above figure, positive indicates increased β\beta power relative to γ\gamma power. The square shape is likely relative to (negative lags) hold time and (positive lags) reaction time, though the squareness is somewhat concerning. Recording is from VIM.

{5}
hide / / print
ref: bookmark-0 tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning date: 12-31-2011 19:30 gmt revision:3 [2] [1] [0] [head]

hunch.net interesting posts:

  • debugging your brain - how to discover what you don't understand. a very intelligent viewpoint, worth rereading + the comments. look at the data, stupid
    • quote: how to represent the problem is perhaps even more important in research since human brains are not as adept as computers at shifting and using representations. Significant initial thought on how to represent a research problem is helpful. And when it’s not going well, changing representations can make a problem radically simpler.
  • automated labeling - great way to use a human 'oracle' to bootstrap us into good performance, esp. if the predictor can output a certainty value and hence ask the oracle all the 'tricky questions'.
  • The design of an optimal research environment
    • Quote: Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).
  • More recently: http://hunch.net/?p=2016
    • Problem is that online course only imperfectly emulate the social environment of a college, which IMHO are useflu for cultivating diligence.
  • The unrealized potential of the research lab Quote: Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting.
    • The motivation (money!) is just not there.

{660}
hide / / print
ref: -0 tags: perl one-liner search files cat grep date: 02-16-2009 21:58 gmt revision:2 [1] [0] [head]

In the process of installing compiz - which I decided I didn't like - I removed Xfce4's window manager, xfwm4, and was stuck with metacity. Metacity probably allows focus-follows-mouse, but this cannot be configured with Xfce's control panel, hence I had to figure out how to change it back. For this, I wrote a command to look for all files, opening each, and seeing if there are any lines that match "metacity". It's a brute force approach, but one that does not require much thinking or googling.

find . -print | grep -v mnt | \
perl -e 'while($k = <STDIN>){open(FH,"< $k");while($j=<FH>){if($j=~/metacity/){print "found $k";}}close FH;}' 
This led me to discover ~/.cache/sessions/xfce4-session-loco:0 (the name of the computer is loco). I changed all references of 'metacity' to 'xfwm4', and got the proper window manager back.

{409}
hide / / print
ref: bookmark-0 tags: optimization function search matlab linear nonlinear programming date: 08-09-2007 02:21 gmt revision:0 [head]

http://www.mat.univie.ac.at/~neum/

very nice collection of links!!

{18}
hide / / print
ref: notes-0 tags: SQL fulltext search example date: 0-0-2006 0:0 revision:0 [head]

SELECT * FROM `base` WHERE MATCH(`From`, `To`) AGAINST('hanson') ORDER BY `Date` DESC Limit 0, 100

  • you need to have a fulltext on the column set provided as a parameter to the MATCH() keyword. Case does not matter so log as the coalition is correct.

{31}
hide / / print
ref: bookmark-0 tags: job_search professional employment wisdom date: 0-0-2006 0:0 revision:0 [head]

http://www.tcnj.edu/~rgraham/wisdom.html