m8ta
use https for features. 

{1541}  
Like this blog but 100% better!  
{1408}  
LDMNet: Low dimensional manifold regularized neural nets.
 
{1384}  
PMID28246640 Ultraflexible nanoelectronic probes form reliable, glial scar–free neural integration
 
{1341} 
ref: 0
tags: image registration optimization camera calibration sewing machine
date: 07152016 05:04 gmt
revision:20
[19] [18] [17] [16] [15] [14] [head]


Recently I was tasked with converting from image coordinates to real world coordinates from stereoscopic cameras mounted to the endeffector of a robot. The end goal was to let the user (me!) click on points in the image, and have the robot record that position & ultimately move to it. The overall strategy is to get a set of points in both image and RW coordinates, then fit some sort of model to the measured data. I began by printing out a grid of (hopefully evenlyspaced and perpendicular) lines via a laserprinter; spacing was ~1.1 mm. This grid was manually aligned to the axes of robot motion by moving the robot along one axis & checking that the lines did not jog. The images were modeled as a grating with quadratic phase in $u,v$ texture coordinates: $p_h(u,v) = sin((a_h u/1000 + b_h v/1000 + c_h)v + d_h u + e_h v + f_h) + 0.97$ (1) $p_v(u,v) = sin((a_v u/1000 + b_v v/1000 + c_v)u + d_v u + e_v v + f_v) + 0.97$ (2) $I(u,v) = 16 p_h p_v / ( \sqrt{ 2 + 16 p_h^2 + 16 p_v^2})$ (3) The 1000 was used to make the parameter search distribution more spherical; $c_h,c_v$ were bias terms to seed the solver; 0.97 was a dutycycle term fit by inspection to the image data; (3) is a modified sigmoid. $I$ was then optimized over the parameters using a GPUaccelerated (CUDA) nonlinear stochastic optimization: $(a_h,b_h,d_h,e_h,f_h  a_v,b_v,d_v,e_v,f_v) = Argmin \sum_u \sum_v (I(u,v)  Img(u,v))^2$ (4) Optimization was carried out by drawing parameters from a normal distribution with a diagonal covariance matrix, set by inspection, and mean iteratively set to the best solution; horizontal and vertical optimization steps were separable and carried out independently. The equation (4) was sampled 18k times, and equation (3) 34 billion times per frame. Hence the need for GPU acceleration. This yielded a set of 10 parameters (again, $c_h$ and $c_v$ were bias terms and kept constant) which modeled the data (e.g. grid lines) for each of the two cameras. This process was repeated every 0.1 mm from 0  20 mm height (z) from the target grid, resulting in a sampled function for each of the parameters, e.g. $a_h(z)$ . This required 13 trillion evaluations of equation (3). Now, the task was to use this model to generate the forward and reverse transform from image to world coordinates; I approached this by generating a data set of the grid intersections in both image and world coordinates. To start this process, the known image origin $u_{origin}_{z=0},v_{origin}_{z=0}$ was used to find the corresponding roots of the periodic axillary functions $p_h,p_v$ : $\frac{3 \pi}{ 2} + 2 \pi n_h = a_h u v/1000 + b_h v^2/1000 + (c_h + e_h)v + d_h u + f_h$ (5) $\frac{3 \pi}{ 2} + 2 \pi n_h = a_v u^2/1000 + b_v u v/1000 + (c_v + d_v)u + e_v v + f_v$ (6) Or .. $n_h = round( (a_h u v/1000 + b_h v^2/1000 + (c_h + e_h)v + d_h u + f_h  \frac{3 \pi}{ 2} ) / (2 \pi )$ (7) $n_v = round( (a_v u^2/1000 + b_v u v/1000 + (c_v + d_v)u + e_v v + f_v  \frac{3 \pi}{ 2} ) / (2 \pi)$ (8) From this, we get variables $n_{h,origin}_{z=0} and n_{v,origin}_{z=0}$ which are the offsets to align the sine functions $p_h,p_v$ with the physical origin. Now, the reverse (world to image) transform was needed, for which a twostage newton scheme was used to solve equations (7) and (8) for $u,v$ . Note that this is an equation of phase, not image intensity  otherwise this direct method would not work! First, the equations were linearized with three steps of (911) to get in the right ballpark: $u_0 = 640, v_0 = 360$ $n_h = n_{h,origin}_{z} + [30 .. 30] , n_v = n_{v,origin}_{z} + [20 .. 20]$ (9) $B_i = {\left[ \begin{matrix} \frac{3 \pi}{ 2} + 2 \pi n_h  a_h u_i v_i / 1000  b_h v_i^2  f_h \\ \frac{3 \pi}{ 2} + 2 \pi n_v  a_v u_i^2 / 1000  b_v u_i v_i  f_v \end{matrix} \right]}$ (10) $A_i = {\left[ \begin{matrix} d_h && c_h + e_h \\ c_v + d_v && e_v \end{matrix} \right]}$ and ${\left[ \begin{matrix} u_{i+1} \\ v_{i+1} \end{matrix} \right]} = mldivide(A_i,B_i)$ (11) where mldivide is the Matlab operator. Then three steps with the full Jacobian were made to attain accuracy: $J_i = {\left[ \begin{matrix} a_h v_i / 1000 + d_h && a_h u_i / 1000 + 2 b_h v_i / 1000 + c_h + e_h \\ 2 a_v u_i / 1000 + b_v v_i / 1000 + c_v + d_v && b_v u_i / 1000 + e_v \end{matrix} \right]}$ (12) $K_i = {\left[ \begin{matrix} a_h u_i v_i/1000 + b_h v_i^2/1000 + (c_h+e_h) v_i + d_h u_i + f_h  \frac{3 \pi}{ 2}  2 \pi n_h \\ a_v u_i^2/1000 + b_v u_i v_i/1000 + (c_v+d_v) u_i + e_v v + f_v  \frac{3 \pi}{ 2}  2 \pi n_v \end{matrix} \right]}$ (13) ${\left[ \begin{matrix} u_{i+1} \\ v_{i+1} \end{matrix} \right]} = {\left[ \begin{matrix} u_i \\ v_i \end{matrix} \right]}  J^{1}_i K_i$ (14) Solutions $(u,v)$ were verified by plugging back into equations (7) and (8) & verifying $n_h, n_v$ were the same. Inconsistent solutions were discarded; solutions outside the image space $[0, 1280),[0, 720)$ were also discarded. The process (10)  (14) was repeated to tile the image space with gird intersections, as indicated in (9), and this was repeated for all $z$ in $(0 .. 0.1 .. 20)$ , resulting in a large (74k points) dataset of $(u,v,n_h,n_v,z)$ , which was converted to full realworld coordinates based on the measured spacing of the grid lines, $(u,v,x,y,z)$ . Between individual z steps, $n_{h,origin} n_{v,origin}$ was reestimated to minimize (for a current $z'$ ): $(u_{origin}_{z' + 0.1}  u_{origin}_{z' + 0.1})^2 + (v_{origin}_{z' + 0.1} + v_{origin}_{z'})^2$ (15) with gridsearch, and the method of equations (914). This was required as the stochastic method used to find original image model parameters was agnostic to phase, and so phase (via parameter $f_{}$ ) could jump between individual $z$ measurements (the origin did not move much between successive measurements, hence (15) fixed the jumps.) To this dataset, a model was fit: ${\left[ \begin{matrix} u \\ v \end{matrix} \right]} = A {\left[ \begin{matrix} 1 && x && y && z && x'^2 && y'^2 && \prime z'^2 && w^2 && x' y' && x' z' && y' z' && x' w && y' w && z' w \end{matrix} \right]}$ (16) Where $x' = \frac{x}{ 10}$ , $y' = \frac{y}{ 10}$ , $z' = \frac{z}{ 10}$ , and $w = \frac{ 20}{20  z}$ . $w$ was introduced as an axillary variable to assist in perspective mapping, ala computer graphics. Likewise, $x,y,z$ were scaled so the quadratic nonlinearity better matched the data. The model (16) was fit using regular linear regression over all rows of the validated dataset. This resulted in a second set of coefficients $A$ for a model of world coordinates to image coordinates; again, the model was inverted using Newton's method (Jacobian omitted here!). These coefficients, one set per camera, were then integrated into the C++ program for displaying video, and the inverse mapping (using closedform matrix inversion) was used to convert mouse clicks to realworld coordinates for robot motor control. Even with the relatively poor wideFOV cameras employed, the method is accurate to $\pm 50\mu m$ , and precise to $\pm 120\mu m$ .  
{365}  
IEEE717081 (pdf) An Implantable Multichannel Digital neural recording system for a micromachined sieve electrode
____References____ Akin, T. and Najafi, K. and Bradley, R.M. SolidState Sensors and Actuators, 1995 and Eurosensors IX.. Transducers '95. The 8th International Conference on 1 51 54 (1995)  
{5} 
ref: bookmark0
tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning
date: 12312011 19:30 gmt
revision:3
[2] [1] [0] [head]


hunch.net interesting posts:
 
{714}  
PMID12433288[0] Realtime computing without stable states: a new framework for neural computation based on perturbations.
____References____
 
{723} 
ref: notes0
tags: data effectiveness Norvig google statistics machine learning
date: 12062011 07:15 gmt
revision:1
[0] [head]


The unreasonable effectiveness of data.
 
{871}  
http://www.autonlab.org/tutorials/  excellent http://energyfirefox.blogspot.com/2010/12/dataminingwithubuntu.html  aptget!  
{858}  
Notes & responses to evolutionary psychologists John Toobey and Leda Cosmides'  authors of The Adapted Mind  essay in This Will change Everything
 
{815}  
Jacques Pitrat seems to have many of the same ideas that I've had (only better, and he's implemented them!) A Step toward and Artificial Scientist
Artificial beings  his book.  
{796}  
An interesting field in ML is nonlinear dimensionality reduction  data may appear to be in a highdimensional space, but mostly lies along a nonlinear lowerdimensional subspace or manifold. (Linear subspaces are easily discovered with PCA or SVD(*)). Dimensionality reduction projects highdimensional data into a lowdimensional space with minimum information loss > maximal reconstruction accuracy; nonlinear dim reduction does this (surprise!) using nonlinear mappings. These techniques set out to find the manifold(s):
(*) SVD maps into 'concept space', an interesting interpretation as per Leskovec's lecture presentation.  
{795} 
ref: work0
tags: machine learning reinforcement genetic algorithms
date: 10262009 04:49 gmt
revision:1
[0] [head]


I just had dinner with Jesse, and the we had a good/productive discussion/brainstorm about algorithms, learning, and neurobio. Two things worth repeating, one simpler than the other: 1. Gradient descent / NewtonRhapson like techniques should be tried with genetic algorithms. As of my current understanding, genetic algorithms perform an semidirected search, randomly exploring the space of solutions with natural selection exerting a pressure to improve. What if you took the partial derivative of each of the organism's genes, and used that to direct mutation, rather than random selection of the mutated element? What if you looked before mating and crossover? Seems like this would speed up the algorithm greatly (though it might get it stuck in local minima, too). Not sure if this has been done before  if it has, edit this to indicate where! 2. Most supervised machine learning algorithms seem to rely on one single, externally applied objective function which they then attempt to optimize. (Rather this is what convex programming is. Unsupervised learning of course exists, like PCA, ICA, and other means of learning correlative structure) There are a great many ways to do optimization, but all are exactly that  optimization, search through a space for some set of weights / set of rules / decision tree that maximizes or minimizes an objective function. What Jesse and I have arrived at is that there is no real utility function in the world, (Corollary #1: life is not an optimization problem (**))  we generate these utility functions, just as we generate our own behavior. What would happen if an algorithm iteratively estimated, checked, crossvalidated its utility function based on the small rewards actually found in the world / its synthetic environment? Would we get generative behavior greater than the complexity of the inputs? (Jesse and I also had an indepth talk about information generation / destruction in nonlinear systems.) Put another way, perhaps part of learning is to structure internal valuation / utility functions to set up reinforcement learning problems where the reinforcement signal comes according to satisfaction of subgoals (= local utility functions). Or, the gradient signal comes by evaluating partial derivatives of actions wrt Creating these goals is natural but not always easy, which is why one reason (of very many!) sports are so great  the utility function is clean, external, and immutable. The recursive, introspective creation of valuation / utility functions is what drives a lot of my internal monologues, mixed with a hefty dose of taking partial derivatives (see {780}) based on models of the world. (Stated this way, they seem so similar that perhaps they are the same thing?) To my limited knowledge, there has been some work as of recent in the creation of subgoals in reinforcement learning. One paper I read used a system to look for states that had a high ratio of ultimately rewarded paths to unrewarded paths, and selected these as subgoals (e.g. rewarded the agent when this state was reached.) I'm not talking about these sorts of subgoals. In these systems, there is an ultimate goal that the researcher wants the agent to achieve, and it is the algorithm's (or s') task to make a policy for generating/selecting behavior. Rather, I'm interested in even more unstructured tasks  make a utility function, and a behavioral policy, based on small continuous (possibly irrelevant?) rewards in the environment. Why would I want to do this? The pet project I have in mind is a 'cognitive' PCB part placement / layout / routing algorithm to add to my pet project, kicadocaml, to finally get some people to use it (the attention economy :) In the course of thinking about how to do this, I've realized that a substantial problem is simply determining what board layouts are good, and what are not. I have a rough aesthetic idea + some heuristics that I learned from my dad + some heuristics I've learned through practice of what is good layout and what is not  but, how to code these up? And what if these aren't the best rules, anyway? If i just code up the rules I've internalized as utility functions, then the board layout will be pretty much as I do it  boring! Well, I've stated my subgoal in the form of a problem statement and some criteria to meet. Now, to go and search for a decent solution to it. (Have to keep this blog m8ta!) (Or, realistically, to go back and see if the problem statement is sensible). (**) Corollary #2  There is no god. nod, Dawkins.  
{780}  
A Selflearning Evolutionary Chess Program
 
{793}  
Andrew Ng's notes on learning theory
 
{792}  
http://www.cs.cmu.edu/~wcohen/slipper/
 
{787}  
My theory on the Flynn effect  human intelligence IS increasing, and this is NOT stopping. Look at it from a ML perspective: there is more free time to get data, the data (and world) has almost unlimited complexity, the data is much higher quality and much easier to get (the vast internet & world!(travel)), there is (hopefully) more fuel to process that data (food!). Therefore, we are getting more complex, sophisticated, and intelligent. Also, the idea that lessintelligent people having more kids will somehow 'dilute' our genetic IQ is bullshit  intelligence is mostly a product of environment and education, and is tailored to the tasks we need to do; it is not (or only very weakly, except at the extremes) tied to the wetware. Besides, things are changing far too fast for genetics to follow. Regarding this social media, like facebook and others, you could posit that social intelligence is increasing, along similar arguments to above: social data is seemingly more prevalent, more available, and people spend more time examining it. Yet this feels to be a weaker argument, as people have always been socializing, talking, etc., and I'm not sure if any of these social media have really increased it. Irregardless, people enjoy it  that's the important part. My utopia for today :)  
{695}  
Alopex: A CorrelationBased Learning Algorithm for FeedForward and Recurrent Neural Networks (1994)
 
{609} 
ref: 0
tags: differential dynamic programming machine learning
date: 09242008 23:39 gmt
revision:2
[1] [0] [head]


 
{7} 
ref: bookmark0
tags: book information_theory machine_learning bayes probability neural_networks mackay
date: 002007 0:0
revision:0
[head]


http://www.inference.phy.cam.ac.uk/mackay/itila/book.html  free! (but i liked the book, so I bought it :)  
{29}  
Iterative Linear Quadratic regulator design for nonlinear biological movement systems
 
{37} 
ref: bookmark0
tags: Unscented sigma_pint kalman filter speech processing machine_learning SDRE control UKF
date: 002007 0:0
revision:0
[head]


 
{8} 
ref: bookmark0
tags: machine_learning algorithm meta_algorithm
date: 002006 0:0
revision:0
[head]


Boost learning or AdaBoost  the idea is to update the discrete distribution used in training any algorithm to emphasize those points that are misclassified in the previous fit of a classifier. sensitive to outliers, but not overfitting.  
{20} 
ref: bookmark0
tags: neural_networks machine_learning matlab toolbox supervised_learning PCA perceptron SOM EM
date: 002006 0:0
revision:0
[head]


http://www.ncrg.aston.ac.uk/netlab/index.php n.b. kinda old. (or does that just mean well established?)  
{43}  
http://www.iovs.org/cgi/reprint/46/4/1322.pdf A related machine learning classifier, the relevance vector machine (RVM), has recently been introduced, which, unlike SVM, incorporates probabalistic output (probability of membership) through Bayesian inference. Its decision function depends on fewer input variables that SVM, possibly allowing better classification for small data sets with high dimensionality.
 
{55}  
 
{61} 
ref: bookmark0
tags: smith predictor motor control wolpert cerebellum machine_learning prediction
date: 002006 0:0
revision:0
[head]


http://prism.bham.ac.uk/pdf_files/SmithPred_93.PDF
 
{66} 
ref: bookmark0
tags: machine_learning classification entropy information
date: 002006 0:0
revision:0
[head]


http://iridia.ulb.ac.be/~lazy/  Lazy Learning. 