GFlowNet Tutorial
 It's basically like RL, only treating the reward as a scaled unnormalized probability or 'flow'.
 Unlike RL, GFN are constructive, only add elements (actions), which means the resulting graph is either a DAG or tree. (No state aliasing)
 Also unlike RL / REINFORCE / Actorcritic, the objective is to match forward and reverse flows, both parameterized by NNs. Hence, rather than BPTT or unrolls, information propagation is via the reverse policy model. This forwardbackward difference based loss is reminiscent of selfsupervised Barlow Twins, BYOL, Siamese networks, or [1][2]. Bengio even has a paper talking about it [3].
 The fact that it works well means that it must be doing some sort of useful regularization, which is super interesting.
 Or it just means there are N+1 ways of skinning the cat!
 Adopting a $TD(\lambda)$ approach of sampling trajectories for reward backpropagation improves convergence/generalization. Really not that different from RL..
 At least 4 different objectives (losses):
 Matching perstate in and out flow
 Matching perstate forward and backward flow
 Matching wholetrajectory forward and backward flow
 Subsampling wholetrajectory and matching their flow.
