Sketch - Program synthesis by sketching
- Sketch consists of a C-like language with the additional primitives: integer-valued holes ??, reorder, generator (macros where the holes can vary), assert, repeat (also a variable-hole parameterized macro), regular-expression generators for selecting substitute programming primitives, and atomics for concurrency-focused synthesis (fork and compare-and-swap).
- The other common programming primitives are there, too - if, else, while, procedures, arrays.
- Loops, arrays, and recursion are limited to a fixed size to constrain the search space.
- Sketch is inductive - working solutions are derived from the skeleton provided by the programmer + counterexamples provided by a verification function.
- This is compared to previous work, which focused on deductive synthesis: programs are derived from a constructive proof of the satisfiabiltiy of a problem.
- This 'proved' challenging in practice -- programmers are much better versed in construction rather than abstract mathematical derivation.
- "By giving up the deductive approach, we lose correctness by construction, and we are forced to rely on an external validation procedure."
- So they use CEGIS: CounterExample Guided Inductive Synthesis.
- For most sketches, only a small set of counterexamples are needed to fully constrain the solution.
- The counterexamples are fed to the inductive synthesizer so that the next candidate will be guaranteed to work correctly for all.
- Each counterexample reduces the search space by (typically) 2^15 -- much information for each -- so only a handful of validation calls are usually needed, even when the solution space is very large.
- E.g. 40 iterations when the candidate space is 2^600
- Illuminating that computers can so readily handle such large spaces with prefect fidelity!
- Counterexamples are much easier to find than an abstact model (as in the decuctive approach)
- To do the synthesis, the sketch is expanded to a 'control': a set of variables, holes and calling contexts.
- This control includes parameterized values that track the value of each variable, along with operators that define their relation to other variables, holes, and calling contexts.
- The denotation function translates any expression to a function from a state to a parameterized value.
- The program sketch is parsed (partially evaluated) to iteratively update , where is a mapping from the set of variable names to parameterized values, and is the set of control functions (holes and calling contexts).
-
- With control flow like while and if, the set of parameterized values (with calling context) and control functions are unrolled (to a limit) and unioned for the two branches (if & else).
- Similarly for procedures and generators -- iterative state is updated with additional calling context (and bounded recursion / bounded evaluation! - is a hyperparameter)
- Assertions and "ifs" are modeled as intersections on the iterative state.
- In this way, semantics can model the behavior of all candidate solutions simultaneously -- e.g. is a set of all possible assignments to the holes.
-
- (Chapter 5) In practice: control is represented in an intermediate language, itself represented as a DAG.
- Several optimizations are applied to make the DAG simpler, e.g. structural hashing to remove common sub-trees, and constant propagation (which may be redundant with internal SAT rules).
- There seems to be a tension between Chapter 3 (abstract denotational semantics) and Chapter 5 (concrete DAG manipulations); how exactly they relate could have been spelled out better.
- I.e.: how do the context, parameterized values, and control functions directly lead to the DAG??
- I infer: the DAG seems to be a representation of the data & code flow graph that makes explicit the relationships between variables, holes, and control structures. (this is never mentioned, but seems to be the meaning of 'denotation'.)
- The denotation function (compilation or partial evaluation) only needs to be computed once -- subsequent CEGIS loop evaluations only need to update the SAT problem from the counter-examples.
- Integers are represented via mutually-exclusive & ordered guard bits: if a bit is 1, then the hole or variable evaluates to that integer. Aka one-hot encoding. This permits translation from DAG to CNF.
- CEGIS starts by finding a solution = integer assignment of holes & permutations that make the SAT problem evaluate to True.
- Based on this control vector, it then looks for a solution (value of the input variables) to the SAT problem that evaluates as False.
- Clever re-use of the SAT solver & representation!
- This counter-example is used to improve the SAT problem = add statements (system must evaluate to true with all input-output counter-examples) = remove hypotheses from .
- Did not delve much into the concurrency-related chapters (7-9) or the Stencil scientific computing chapters (10-11).
The essential algorithm, in words:
Take the sketch, expand it to a set of parameterized variables, holes, and calling contexts. Convert these to a DAG aka (?) data-code flow graph w/ dependencies. Try to simplify the DAG, one-hot encode integers, and convert to either a conjunctive-normal-form (CNF) SAT problem for MiniSat, or to a boolean circuit for the ABC solver.
Apply MiniSat or ABC to the problem to select a set of control values = values for the holes & permutations that satisfy the boolean constraints. Using this solution, use the SAT solver to find a input variable configuration that does not satisfy the problem. This serves as a counter-example. Run this through the validator function (oracle) to see what it does; use the counter-example and (inputs and outputs) to add clauses to the SAT problem. Run several times until either no counter-examples can be found or the problem is `unsat`.
Though the thesis describes a system that was academic & relatively small back in 2008, Sketch has enjoyed continuous development, and remains used. I find the work that went into it to be remarkable and impressive -- even with incremental improvements, you need accurate expansion of the language & manipulations to show proof-of-principle. Left wondering what limits its application to even larger problems -- need for a higher-level loop that further subdivides / factorizes the problem, or DFS for filling out elements of the sketch?
Interesting links discovered in while reading the dissertation:
- Lock-free queues using compare_exchange_weak in C++ (useful!)
- Karatsuba algorithm for multiplying large n-digit numbers with complexity. (Trick: divide and conquer & use algebra to re-use intermediate results via addition and subtraction)
|