You are not authenticated, login.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -0 tags: concept net NLP transformers graph representation knowledge date: 11-04-2021 17:48 gmt revision:0 [head]

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

  • From a team at University of Washington / Allen institute for artificial intelligence/
  • Courtesy of Yannic Kilcher's youtube channel.
  • General idea: use GPT-3 as a completion source given a set of prompts, like:
    • X starts running
      • So, X gets in shape
    • X and Y engage in an argument
      • So, X wants to avoid Y.
  • There are only 7 linkage atoms (edges, so to speak) in these queries, but of course many actions / direct objects.
    • These prompts are generated from the Atomic 20-20 human-authored dataset.
    • The prompts are fed into 175B parameter DaVinci model, resulting in 165k examples in the 7 linkages after cleaning.
    • In turn the 165k are fed into a smaller version of GPT-3, Curie, that generates 6.5M text examples, aka Atomic 10x.
  • Then filter the results via a second critic model, based on fine-tuned RoBERTa & human supervision to determine if a generated sentence is 'good' or not.
  • By throwing away 62% of Atomic 10x, they get a student accuracy of 96.4%, much better than the human-designed knowledge graph.
    • They suggest that one way thins works is by removing degenerate outputs from GPT-3.

Human-designed knowledge graphs are described here: ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

And employed for profit here: https://www.luminoso.com/

hide / / print
ref: -2017 tags: locality sensitive hashing olfaction kenyon cells neuron sparse representation date: 01-18-2020 21:13 gmt revision:1 [0] [head]

PMID-29123069 A neural algorithm for a fundamental computing problem

  • Ceneral idea: locality-sensitive hashing, e.g. hashing that is sensitive to the high-dimensional locality of the input space, can be efficiently solved using a circuit inspired by the insect olfactory system.
  • Here, activation of 50 different types of ORNs is mapped to 50 projection neurons, which 'centers the mean' -- concentration dependence is removed.
  • This is then projected via a random matrix of sparse binary weights to a much larger set of Kenyon cells, which in turn are inhibited by one APL neuron.
  • Normal locality-sensitive hashing uses dense matrices of Gaussian-distributed random weights, which means higher computational complexity...
  • ... these projections are governed by the Johnson-Lindenstrauss lemma, which says that projection from high-d to low-d space can preserve locality (distance between points) within an error bound.
  • Show that the WTA selection of the top 5% plus random binary weight preserves locality as measured by overlap with exact input locality on toy data sets, including MNIST and SIFT.
  • Flashy title as much as anything else got this into Science... indeed, has only been cited 6 times in Pubmed.

hide / / print
ref: -2018 tags: sparse representation auditory cortex excitatation inhibition balance date: 03-11-2019 20:47 gmt revision:1 [0] [head]

PMID-30307493 Sparse Representation in Awake Auditory Cortex: Cell-type Dependence, Synaptic Mechanisms, Developmental Emergence, and Modulation.

  • Sparse representation arises during development in an experience-dependent manner, accompanied by differential changes of excitatory input strength and a transition from unimodal to bimodal distribution of E/I ratios.

hide / / print
ref: -2008 tags: representational similarity analysis fMRI date: 02-15-2019 02:27 gmt revision:1 [0] [head]

PMID-19104670 Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience

  • Nikolaus Kriegeskorte, Marieke Mur, and Peter Bandettini
  • Alright, there seems to be no math in the article (?), but it seems well cited so best be on the radar.
  • RDM = representational dissimilarity matrices
    • Just a symmetric matrix of dissimilarity, e.g. correlation, euclidean distance, absolute activation distance ( L 1L_1 ?)
  • RSA = representational similarity analysis
    • Comparison of the upper triangle of two RDMs, using the same metrics.
    • Or, alternately, second-order isomorphism.
  • So.. high level: