m8ta
You are not authenticated, login. 

{1559}  
Some investigations into denoising models & their intellectual lineage: Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2015
Generative Modeling by Estimating Gradients of the Data Distribution July 2019
Denoising Diffusion Probabilistic Models June 2020
Improved Denoising Diffusion Probabilistic Models Feb 2021
Diffusion Models Beat GANs on Image Synthesis May 2021
In all of above, it seems that the inversediffusion function approximator is a minor player in the paper  but of course, it's vitally important to making the system work. In some sense, this 'diffusion model' is as much a means of training the neural network as it is a (rather inefficient, compared to GANs) way of sampling from the data distribution. In Nichol & Dhariwal Feb 2021, they use a Unet convolutional network (e.g. start with few channels, downsample and double the channels until there are 128256 channels, then upsample x2 and half the channels) including multiheaded attention. Ho 2020 used singleheaded attention only at the 16x16 level. Ho 2020 in turn was based on PixelCNN++
which is an improvement to (e.g. add seltattention layers) Conditional Image Generation with PixelCNN Decoders
Most recently, GLIDE: Towards Photorealistic Image Generation and Editing with TextGuided Diffusion Models
Added textconditional generation + many more parameters + much more compute to yield very impressive image results + inpainting. This last effect is enabled by the fact that it's a full generative denoising probabilistic model  you can condition on other parts of the image!  
{1341} 
ref: 0
tags: image registration optimization camera calibration sewing machine
date: 07152016 05:04 gmt
revision:20
[19] [18] [17] [16] [15] [14] [head]


Recently I was tasked with converting from image coordinates to real world coordinates from stereoscopic cameras mounted to the endeffector of a robot. The end goal was to let the user (me!) click on points in the image, and have the robot record that position & ultimately move to it. The overall strategy is to get a set of points in both image and RW coordinates, then fit some sort of model to the measured data. I began by printing out a grid of (hopefully evenlyspaced and perpendicular) lines via a laserprinter; spacing was ~1.1 mm. This grid was manually aligned to the axes of robot motion by moving the robot along one axis & checking that the lines did not jog. The images were modeled as a grating with quadratic phase in $u,v$ texture coordinates: $p_h(u,v) = sin((a_h u/1000 + b_h v/1000 + c_h)v + d_h u + e_h v + f_h) + 0.97$ (1) $p_v(u,v) = sin((a_v u/1000 + b_v v/1000 + c_v)u + d_v u + e_v v + f_v) + 0.97$ (2) $I(u,v) = 16 p_h p_v / ( \sqrt{ 2 + 16 p_h^2 + 16 p_v^2})$ (3) The 1000 was used to make the parameter search distribution more spherical; $c_h,c_v$ were bias terms to seed the solver; 0.97 was a dutycycle term fit by inspection to the image data; (3) is a modified sigmoid. $I$ was then optimized over the parameters using a GPUaccelerated (CUDA) nonlinear stochastic optimization: $(a_h,b_h,d_h,e_h,f_h  a_v,b_v,d_v,e_v,f_v) = Argmin \sum_u \sum_v (I(u,v)  Img(u,v))^2$ (4) Optimization was carried out by drawing parameters from a normal distribution with a diagonal covariance matrix, set by inspection, and mean iteratively set to the best solution; horizontal and vertical optimization steps were separable and carried out independently. The equation (4) was sampled 18k times, and equation (3) 34 billion times per frame. Hence the need for GPU acceleration. This yielded a set of 10 parameters (again, $c_h$ and $c_v$ were bias terms and kept constant) which modeled the data (e.g. grid lines) for each of the two cameras. This process was repeated every 0.1 mm from 0  20 mm height (z) from the target grid, resulting in a sampled function for each of the parameters, e.g. $a_h(z)$ . This required 13 trillion evaluations of equation (3). Now, the task was to use this model to generate the forward and reverse transform from image to world coordinates; I approached this by generating a data set of the grid intersections in both image and world coordinates. To start this process, the known image origin $u_{origin}_{z=0},v_{origin}_{z=0}$ was used to find the corresponding roots of the periodic axillary functions $p_h,p_v$ : $\frac{3 \pi}{ 2} + 2 \pi n_h = a_h u v/1000 + b_h v^2/1000 + (c_h + e_h)v + d_h u + f_h$ (5) $\frac{3 \pi}{ 2} + 2 \pi n_h = a_v u^2/1000 + b_v u v/1000 + (c_v + d_v)u + e_v v + f_v$ (6) Or .. $n_h = round( (a_h u v/1000 + b_h v^2/1000 + (c_h + e_h)v + d_h u + f_h  \frac{3 \pi}{ 2} ) / (2 \pi )$ (7) $n_v = round( (a_v u^2/1000 + b_v u v/1000 + (c_v + d_v)u + e_v v + f_v  \frac{3 \pi}{ 2} ) / (2 \pi)$ (8) From this, we get variables $n_{h,origin}_{z=0} and n_{v,origin}_{z=0}$ which are the offsets to align the sine functions $p_h,p_v$ with the physical origin. Now, the reverse (world to image) transform was needed, for which a twostage newton scheme was used to solve equations (7) and (8) for $u,v$ . Note that this is an equation of phase, not image intensity  otherwise this direct method would not work! First, the equations were linearized with three steps of (911) to get in the right ballpark: $u_0 = 640, v_0 = 360$ $n_h = n_{h,origin}_{z} + [30 .. 30] , n_v = n_{v,origin}_{z} + [20 .. 20]$ (9) $B_i = {\left[ \begin{matrix} \frac{3 \pi}{ 2} + 2 \pi n_h  a_h u_i v_i / 1000  b_h v_i^2  f_h \\ \frac{3 \pi}{ 2} + 2 \pi n_v  a_v u_i^2 / 1000  b_v u_i v_i  f_v \end{matrix} \right]}$ (10) $A_i = {\left[ \begin{matrix} d_h && c_h + e_h \\ c_v + d_v && e_v \end{matrix} \right]}$ and ${\left[ \begin{matrix} u_{i+1} \\ v_{i+1} \end{matrix} \right]} = mldivide(A_i,B_i)$ (11) where mldivide is the Matlab operator. Then three steps with the full Jacobian were made to attain accuracy: $J_i = {\left[ \begin{matrix} a_h v_i / 1000 + d_h && a_h u_i / 1000 + 2 b_h v_i / 1000 + c_h + e_h \\ 2 a_v u_i / 1000 + b_v v_i / 1000 + c_v + d_v && b_v u_i / 1000 + e_v \end{matrix} \right]}$ (12) $K_i = {\left[ \begin{matrix} a_h u_i v_i/1000 + b_h v_i^2/1000 + (c_h+e_h) v_i + d_h u_i + f_h  \frac{3 \pi}{ 2}  2 \pi n_h \\ a_v u_i^2/1000 + b_v u_i v_i/1000 + (c_v+d_v) u_i + e_v v + f_v  \frac{3 \pi}{ 2}  2 \pi n_v \end{matrix} \right]}$ (13) ${\left[ \begin{matrix} u_{i+1} \\ v_{i+1} \end{matrix} \right]} = {\left[ \begin{matrix} u_i \\ v_i \end{matrix} \right]}  J^{1}_i K_i$ (14) Solutions $(u,v)$ were verified by plugging back into equations (7) and (8) & verifying $n_h, n_v$ were the same. Inconsistent solutions were discarded; solutions outside the image space $[0, 1280),[0, 720)$ were also discarded. The process (10)  (14) was repeated to tile the image space with gird intersections, as indicated in (9), and this was repeated for all $z$ in $(0 .. 0.1 .. 20)$ , resulting in a large (74k points) dataset of $(u,v,n_h,n_v,z)$ , which was converted to full realworld coordinates based on the measured spacing of the grid lines, $(u,v,x,y,z)$ . Between individual z steps, $n_{h,origin} n_{v,origin}$ was reestimated to minimize (for a current $z'$ ): $(u_{origin}_{z' + 0.1}  u_{origin}_{z' + 0.1})^2 + (v_{origin}_{z' + 0.1} + v_{origin}_{z'})^2$ (15) with gridsearch, and the method of equations (914). This was required as the stochastic method used to find original image model parameters was agnostic to phase, and so phase (via parameter $f_{}$ ) could jump between individual $z$ measurements (the origin did not move much between successive measurements, hence (15) fixed the jumps.) To this dataset, a model was fit: ${\left[ \begin{matrix} u \\ v \end{matrix} \right]} = A {\left[ \begin{matrix} 1 && x && y && z && x'^2 && y'^2 && \prime z'^2 && w^2 && x' y' && x' z' && y' z' && x' w && y' w && z' w \end{matrix} \right]}$ (16) Where $x' = \frac{x}{ 10}$ , $y' = \frac{y}{ 10}$ , $z' = \frac{z}{ 10}$ , and $w = \frac{ 20}{20  z}$ . $w$ was introduced as an axillary variable to assist in perspective mapping, ala computer graphics. Likewise, $x,y,z$ were scaled so the quadratic nonlinearity better matched the data. The model (16) was fit using regular linear regression over all rows of the validated dataset. This resulted in a second set of coefficients $A$ for a model of world coordinates to image coordinates; again, the model was inverted using Newton's method (Jacobian omitted here!). These coefficients, one set per camera, were then integrated into the C++ program for displaying video, and the inverse mapping (using closedform matrix inversion) was used to convert mouse clicks to realworld coordinates for robot motor control. Even with the relatively poor wideFOV cameras employed, the method is accurate to $\pm 50\mu m$ , and precise to $\pm 120\mu m$ .  
{1269} 
ref: 0
tags: hinton convolutional deep networks image recognition 2012
date: 01112014 20:14 gmt
revision:0
[head]


{691}  
PMID18835655[0] Sleeprelated improvements in motor learning following mental practice.
____References____
 
{684}  
PMID8922425 Primary Motor and Sensory Cortex Activation during Motor Performance and Motor Imagery: A Functional Magnetic Resonance Imaging Study.
 
{127} 
ref: bookmark0
tags: thalamus basal ganglia neuroanatomy centromedian red nucleus images
date: 002007 0:0
revision:0
[head]


http://www.neuroanatomy.wisc.edu/coro97/contents.htm coronal sections through the thalamus, very nice!  
{128} 
ref: bookmark0
tags: neuroanatomy pulvinar thalamus superior colliculus image gray brainstem
date: 002007 0:0
revision:0
[head]


http://en.wikipedia.org/wiki/Image:Gray719.png great, very useful! 