{1556} revision 0 modified: 11-04-2021 17:48 gmt

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

  • From a team at University of Washington / Allen institute for artificial intelligence/
  • Courtesy of Yannic Kilcher's youtube channel.
  • General idea: use GPT-3 as a completion source given a set of prompts, like:
    • X starts running
      • So, X gets in shape
    • X and Y engage in an argument
      • So, X wants to avoid Y.
  • There are only 7 linkage atoms (edges, so to speak) in these queries, but of course many actions / direct objects.
    • These prompts are generated from the Atomic 20-20 human-authored dataset.
    • The prompts are fed into 175B parameter DaVinci model, resulting in 165k examples in the 7 linkages after cleaning.
    • In turn the 165k are fed into a smaller version of GPT-3, Curie, that generates 6.5M text examples, aka Atomic 10x.
  • Then filter the results via a second critic model, based on fine-tuned RoBERTa & human supervision to determine if a generated sentence is 'good' or not.
  • By throwing away 62% of Atomic 10x, they get a student accuracy of 96.4%, much better than the human-designed knowledge graph.
    • They suggest that one way thins works is by removing degenerate outputs from GPT-3.

Human-designed knowledge graphs are described here: ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

And employed for profit here: https://www.luminoso.com/