m8ta
use https for features.
text: sort by
tags: modified
type: chronology
{1507}
hide / / print
ref: -2015 tags: winner take all sparsity artificial neural networks date: 03-28-2020 01:15 gmt revision:0 [head]

Winner-take-all Autoencoders

  • During training of fully connected layers, they enforce a winner-take all lifetime sparsity constraint.
    • That is: when training using mini-batches, they keep the k percent largest activation of a given hidden unit across all samples presented in the mini-batch. The remainder of the activations are set to zero. The units are not competing with each other; they are competing with themselves.
    • The rest of the network is a stack of ReLU layers (upon which the sparsity constraint is applied) followed by a linear decoding layer (which makes interpretation simple).
    • They stack them via sequential training: train one layer from the output of another & not backprop the errors.
  • Works, with lower sparsity targets, also for RBMs.
  • Extended the result to WTA covnets -- here enforce both spatial and temporal (mini-batch) sparsity.
    • Spatial sparsity involves selecting the single largest hidden unit activity within each feature map. The other activities and derivatives are set to zero.
    • At test time, this sparsity constraint is released, and instead they use a 4 x 4 max-pooling layer & use that for classification or deconvolution.
  • To apply both spatial and temporal sparsity, select the highest spatial response (e.g. one unit in a 2d plane of convolutions; all have the same weights) for each feature map. Do this for every image in a mini-batch, and then apply the temporal sparsity: each feature map gets to be active exactly once, and in that time only one hidden unit (or really, one location of the input and common weights (depending on stride)) undergoes SGD.
    • Seems like it might train very slowly. Authors didn't note how many epochs were required.
  • This, too can be stacked.
  • To train on larger image sets, they first extract 48 x 48 patches & again stack...
  • Test on MNIST, SVHN, CIFAR-10 -- works ok, and well even with few labeled examples (which is consistent with their goals)

{1257}
hide / / print
ref: -0 tags: Anna Roe optogenetics artificial dura monkeys intrinisic imaging date: 09-30-2013 19:08 gmt revision:3 [2] [1] [0] [head]

PMID-23761700 Optogenetics through windows on the brain in nonhuman primates

  • technique paper.
  • placed over the visual cortex.
  • Injected virus through the artificial dura -- micropipette, not CVD.
  • Strong expression:
  • See also: PMID-19409264 (Boyden, 2009)

{1169}
hide / / print
ref: -0 tags: artificial intelligence projection episodic memory reinforcement learning date: 08-15-2012 19:16 gmt revision:0 [head]

Projective simulation for artificial intelligence

  • Agent learns based on memory 'clips' which are combined using some pseudo-bayesian method to trigger actions.
    • These clips are learned from experience / observation.
    • Quote: "..more complex behavior seems to arise when an agent is able to “think for a while” before it “decides what to do next.” This means the agent somehow evaluates a given situation in the light of previous experience, whereby the type of evaluation is different from the execution of a simple reflex circuit"
    • Quote: "Learning is achieved by evaluating past experience, for example by simple reinforcement learning".
  • The forward exploration of learned action-stimulus patterns is seemingly a general problem-solving strategy (my generalization).
  • Pretty simple task:
    • Robot can only move left / right; shows a symbol to indicate which way it (might?) be going.

{858}
hide / / print
ref: -0 tags: artificial intelligence machine learning education john toobey leda cosmides date: 12-13-2010 03:43 gmt revision:3 [2] [1] [0] [head]

Notes & responses to evolutionary psychologists John Toobey and Leda Cosmides' - authors of The Adapted Mind - essay in This Will change Everything

  • quote: Currently the most keenly awaited technological development is an all-purpose artificial intelligence-perhaps even an intelligence that would revise itself and grow at an ever-accelerating rate until it enacts millennial transformations. [...] Yet somehow this goal, like the horizon, keeps retreating as fast as it is approached.
  • AI's wrong turn was assuming that the best methods for reasoning and thinking are those that can be applied successfully to any problem domain.
    • But of course it must be possible - we are here, and we did evolve!
    • My opinion: the limit is codifying abstract, assumed, and ambiguous information into program function - e.g. embodying the world.
  • Their idea: intelligences use a number of domain-specific, specialized "hacks", that work for limited tasks; general intelligence appears as a result of the combination of all of these.
    • "Our mental programs can be fiendishly well engineered to solve some problems because they are not limited to using only those strategies that can be applied to all problems."
    • Given the content of the wikipedia page (above), it seems that they have latched onto this particular idea for at least 18 years. Strange how these sorts of things work.
  • Having accurate models of human intelligence would achieve two things:
    • It would enable humans to communicate more effectively with machines via shared knowledge and reasoning.
    • (me:) The AI would be enhanced by the tricks and hacks that evolution took millions of years, billions of individuals, and 10e?? (non-discrete) interactions between individuals and the environment. This constitutes an enormous store of information, to overlook it necessitates (probably, there may be seriuos shortcuts to biological evolution) re-simulating all of the steps that it took to get here. We exist as a cashed output of the evolutionary algorithm; recomputing this particular function is energetically impossible.
  • "The long term ambition [of evolutionary psychology] is to develop a model of human nature as precise as if we had the engineering specifications for the control systems of a robot.
  • "Humanity will continue to be blind slaves to the programs evolution has built into our brains until we drag them into the light. Ordinarily, we inhabit only the versions of reality that they spontaneously construct for us -- the surfaces of things. Because we are unaware that we are in a theater, with our roles and our lines largely written for us by our mental programs, we are credulously swept up in these plays (such as the genocidal drama of us versus them). Endless chain reactions among these programs leave us the victims of history -- embedded in war and oppression, enveloped in mass delusions and cultural epidemics, mired in endless negative-sum conflict \\ If we understood these programs and the coordinated hallucinations they orchestrate in our minds, our species could awaken from the roles these programs assign to us. Yet this cannot happen if knowledge -- like quantum mechanics -- remains forever locked up in the minds of a few specialists, walled off by the years of study required to master it. " Exactly. Well said.
    • The solution, then: much much better education; education that utilizes the best knowledge about transferring knowledge.
    • The authors propose video games; this is already being tested, see {859}

{855}
hide / / print
ref: -0 tags: sciences artificial Simon organizations economic rationality date: 12-01-2010 07:33 gmt revision:2 [1] [0] [head]

These are notes from reading Herbert A. Simon’s The Sciences of the Artificial, third edition, 1996 (though most of the material seems from the 70s). They are half quoted / half paraphrased (as needed when the original phrasing was clunky). I’ve added a few of my own observations, and reordered the ideas from the book.

“A large body of evidence shows that human choices are not consistent and transitive, as they would be if a utility function existed ... In general a large gain along one axis is required to compensate for a small loss along another.” HA Simon.

Companies within a capitalist economy make almost negligible use of markets in their internal functioning” - HA Simon. Eg. they are internally command economies. (later, p 40...) We take the frequent movability and indefiniteness of organizational boundaries as evidence that there is often a near balance between the advantages of markets and organizations”

  • Retail sales of automobiles are handled by dealerships
  • Many other commodities are sold directly to the consumer
  • In fast food there are direct outlets and franchises.
  • There are sole source suppliers that produce parts for much larger manufacturers.
I’m realizing / imagining a very flexible system of organizations, tied together and communicating via a liquid ‘blood’ of the market economy.

That said: organizations are not highly centralized structures in which all the important decisions are made at the center; this would exceed the limits of procedural rationality and lose many of the advantages attainable from the use of hierarchical authority. Business organizations, like markets, are vast distributed computers whose decision processes are substantially decentralized. In fact, the work of the head of a corporation is a market-like activity: allocating capital to promising or desirable projects.

In organizations, uncertainty is often a good reason to shift from markets to hierarchies in making decisions. If two different arms of a corporation - production and marketing - make different decisions on the uncertain number of units to be sold next year, there will be a problem. It is better for the management to share assumptions. “Left to the market, this kind of uncertainty leads directly to the dilemmas of rationality that we described earlier in terms of game theory and rational expectations”

I retain vivid memories of the astonishment and disbelief expressed by the architecture students to whom I taught urban land economics many years ago when I pointed to medieval cities as marveluosly patterned systems that had mostly just ‘grown’ in response to myriads of individual human decisions. To my students a pattern implied a planner in whose mind it had been conceived and whose hand it had been implemented. The idea that a city could acquire its patter as naturally as a snowflake was foreign to them ... they reacted to it as many christian fundamentalists responded to Darwin: no design without a Designer!

Markets appear to conserve information and calculation by assigning decisions to actors who can make them on the basis of information that is available to them locally. von Hayek: “The most significant fact about this system is the economy of knowledge with which it operates, o how little the individual participants need to know in order to make the right action”. To maintain actual Pareto optimality in the markets would require information and computational requirements that are exceedingly burdensome and unrealistic (from The New Palgrave: A dictionary of Economics)

Nelson and winter observe that in economic evolution, in contract to biological evolution, sucessful algorithms (business practices) may be borrowed from one firm to the other. The hypothesized system is Lamarkian, because any new idea can be incorporated in opearting procedures as soon as its success is observed" . Also, it's good as corporations don't have secual reproduction / crossover.

{838}
hide / / print
ref: -0 tags: meta learning Artificial intelligence competent evolutionary programming Moshe Looks MOSES date: 08-07-2010 16:30 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

Competent Program Evolution

  • An excellent start, excellent good description + meta-description / review of existing literature.
  • He thinks about things in a slightly different way - separates what I call solutions and objective functions "post- and pre-representational levels" (respectively).
  • The thesis focuses on post-representational search/optimization, not pre-representational (though, I believe that both should meet in the middle - eg. pre-representational levels/ objective functions tuned iteratively during post-representational solution creation. This is what a human would do!)
  • The primary difficulty in competent program evolution is the intense non-decomposability of programs: every variable, constant, branch effects the execution of every other little bit.
  • Competent program creation is possible - humans create programs significantly shorter than lookup tables - hence it should be possible to make a program to do the same job.
  • One solution to the problem is representation - formulate the program creation as a set of 'knobs' that can be twiddled (here he means both gradient-descent partial-derivative optimization and simplex or heuristic one-dimensional probabilistic search, of which there are many good algorithms.)
  • pp 27: outline of his MOSES program. Read it for yourself, but looks like:
  • The representation step above "explicitly addresses the underlying (semantic) structure of program space independently of the search for any kind of modularity or problem decomposition."
    • In MOSES, optimization does not operate directly on program space, but rather on subspaces defined by the representation-building process. These subspaces may be considered as being defined by templates assigning values to some of the underlying dimensions (e.g., they restrict the size and shape of any resulting trees).
  • In chapter 3 he examines the properties of the boolean programming space, which is claimed to be a good model of larger/more complicated programming spaces in that:
    • Simpler functions are much more heavily sampled - e.g. he generated 1e6 samples of 100-term boolean functions, then reduced them to minimal form using standard operators. The vast majority of the resultant minimum length (compressed) functions were simple - tautologies or of a few terms.
    • A corollary is that simply increasing syntactic sample length is insufficient for increasing program behavioral complexity / variety.
      • Actually, as random program length increases, the percentage with interesting behaviors decreases due to the structure of the minimum length function distribution.
  • Also tests random perturbations to large boolean formulae (variable replacement/removal, operator swapping) - ~90% of these do nothing.
    • These randomly perturbed programs show a similar structure to above: most of them have very similar behavior to their neighbors; only a few have unique behaviors. makes sense.
    • Run the other way: "syntactic space of large programs is nearly uniform with respect to semantic distance." Semantically similar (boolean) programs are not grouped together.
  • Results somehow seem a let-down: the program does not scale to even moderately large problem spaces. No loops, only functions with conditional evalutation - Jacques Pitrat's results are far more impressive. {815}
    • Seems that, still, there were a lot of meta-knobs to tweak in each implementation. Perhaps this is always the case?
  • My thought: perhaps you can run the optimization not on program representations, but rather program codepaths. He claims that one problem is that behavior is loosely or at worst chaotically related to program structure - which is true - hence optimization on the program itself is very difficult. This is why Moshe runs optimization on the 'knobs' of a representational structure.

{837}
hide / / print
ref: -0 tags: artificial intelligence Hutters theorem date: 08-05-2010 05:06 gmt revision:0 [head]

Hutter's Theorem: for all problems asymptotically large enough, there exists one algorithm that is within a factor of 5 as fast as the fastest algorithm for a particular problem. http://www.hutter1.net/ai/pfastprg.htm

{695}
hide / / print
ref: -0 tags: alopex machine learning artificial neural networks date: 03-09-2009 22:12 gmt revision:0 [head]

Alopex: A Correlation-Based Learning Algorithm for Feed-Forward and Recurrent Neural Networks (1994)

  • read the abstract! rather than using the gradient error estimate as in backpropagation, it uses the correlation between changes in network weights and changes in the error + gaussian noise.
    • backpropagation requires calculation of the derivatives of the transfer function from one neuron to the output. This is very non-local information.
    • one alternative is somewhat empirical: compute the derivatives wrt the weights through perturbations.
    • all these algorithms are solutions to the optimization problem: minimize an error measure, E, wrt the network weights.
  • all network weights are updated synchronously.
  • can be used to train both feedforward and recurrent networks.
  • algorithm apparently has a long history, especially in visual research.
  • the algorithm is quite simple! easy to understand.
    • use stochastic weight changes with a annealing schedule.
  • this is pre-pub: tables and figures at the end.
  • looks like it has comparable or faster convergence then backpropagation.
  • not sure how it will scale to problems with hundreds of neurons; though, they looked at an encoding task with 32 outputs.

{643}
hide / / print
ref: notes-0 tags: artificial cerebellum robot date: 11-06-2008 17:16 gmt revision:1 [0] [head]

Artificial Cerebellum for robot control: