Learning Explanatory Rules from Noisy Data
 From a dense background of inductive logic programming (ILP): given a set of statements, and rules for transformation and substitution, generate clauses that satisfy a set of 'background knowledge'.
 Programs like Metagol can do this using search and simplify logic built into Prolog.
 Actually kinda surprising how very dense this program is  only 330 lines!
 This task can be transformed into a SAT problem via rules of logic, for which there are many fast solvers.
 The trick here (instead) is that a neural network is used to turn 'on' or 'off' clauses that fit the background knowledge
 BK is typically very small, a few examples, consistent with the small size of the learned networks.
 These weight matrices are represented as the outer product of composed or combined clauses, which makes the weight matrix very large!
 They then do gradient descent, while passing the crossentropy errors through nonlinearities (including clauses themselves? I think this is how recursion is handled.) to update the weights.
 Hence, SGD is used as a means of heuristic search.
 Compare this to Metagol, which is brittle to any noise in the input; unsurprisingly, due to SGD, this is much more robust.

 Way too many words and symbols in this paper for what it seems to be doing. Just seems to be obfuscating the work (which is perfectly good). Again: Metagol is only 330 lines!
