You are not authenticated, login.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -1992 tags: Linsker infomax Hebbian anti-hebbian linear perceptron unsupervised learning date: 08-04-2021 00:20 gmt revision:2 [1] [0] [head]

Local synaptic learning rules suffice to maximize mutual information in a linear network

  • Ralph Linsker, 1992.
  • A development upon {1545} -- this time with lateral inhibition trained through noise-contrast and anti-Hebbian plasticity.
  • {1545} does not perfectly maximize the mutual information between the input and output -- this allegedly requires the inverse of the covariance matrix, QQ .
    • As before, infomax principles; maximize mutual information MIH(Z)H(Z|S)MI \propto H(Z) - H(Z | S) where Z is the network output and S is the signal input. (note: minimize the conditional entropy of output given the input).
    • For a gaussian variable, H=12lndetQH = \frac{ 1}{ 2} ln det Q where Q is the covariance matrix. In this case Q=E|ZZ T|Q = E|Z Z^T |
    • since Z=C(S,N)Z = C(S,N) where C are the weights, S is the signal, and N is the noise, Q=CqC T+rQ = C q C^T + r where q is the covariance matrix of input noise and r is the cov.mtx. of the output noise.
    • (somewhat confusing): δH/δC=Q 1Cq\delta H / \delta C = Q^{-1}Cq
      • because .. the derivative of the determinant is complicated.
      • Check the appendix for the derivation. lndetQ=TrlnQln det Q = Tr ln Q and dH=1/2d(TrlnQ)=1/2Tr(Q 1dQ) dH = 1/2 d(Tr ln Q) = 1/2 Tr( Q^-1 dQ ) -- this holds for positive semidefinite matrices like Q.

  • From this he comes up with a set of rules whereby feedforward weights are trained in a Hebbian fashion, but based on activity after lateral activation.
  • The lateral activation has a weight matrix F=IαQF = I - \alpha Q (again Q is the cov.mtx. of Z). If y(0)=Y;y(t+1)=Y+Fy(t)y(0) = Y; y(t+1) = Y + Fy(t) , where Y is the feed-forward activation, then αy(inf)=Q 1Y\alpha y(\inf) = Q^{-1}Y . This checks out:
x = randn(1000, 10);
Q = x' * x;
a = 0.001;
Y = randn(10, 1);
y = zeros(10, 1); 
for i = 1:1000
	y = Y + (eye(10) - a*Q)*y;

y - pinv(Q)*Y / a % should be zero. 
  • This recursive definition is from Jacobi. αy(inf)=αΣ t=0 infF tY=α(IF) 1Y=Q 1Y\alpha y(\inf) = \alpha \Sigma_{t=0}^{\inf}F^tY = \alpha(I - F)^{-1} Y = Q^{-1}Y .
  • Still, you need to estimate Q through a running-average, ΔQ=1M(Y nY m+r nmQ NM)\Delta Q = \frac{ 1}{M}( Y_n Y_m + r_{nm} - Q_{NM} ) and since F=IαQF = I - \alpha Q , F is formed via anti-hebbian terms.

To this is added a 'sensing' learning and 'noise' unlearning phase -- one optimizes H(Z)H(Z) , the other minimizes H(Z|S)H(Z|S) . Everything is then applied, similar to before, to a gaussian-filtered one-dimensional white-noise stimuli. He shows this results in bandpass filter behavior -- quite weak sauce in an era where ML papers are expected to test on five or so datasets. Even if this was 1992 (nearly forty years ago!), it would have been nice to see this applied to a more realistic dataset; perhaps some of the following papers? Olshausen & Field came out in 1996 -- but they applied their algorithm to real images.

In both Olshausen & this work, no affordances are made for multiple layers. There have to be solutions out there...

hide / / print
ref: -1988 tags: Linsker infomax linear neural network hebbian learning unsupervised date: 08-03-2021 06:12 gmt revision:2 [1] [0] [head]

Self-organizaton in a perceptual network

  • Ralph Linsker, 1988.
  • One of the first (verbose, slightly diffuse) investigations of the properties of linear projection neurons (e.g. dot-product; no non-linearity) to express useful tuning functions.
  • ''Useful' is here information-preserving, in the face of noise or dimensional bottlenecks (like PCA).
  • Starts with Hebbian learning functions, and shows that this + white-noise sensory input + some local topology, you can get simple and complex visual cell responses.
    • Ralph notes that neurons in primate visual cortex are tuned in utero -- prior real-world visual experience! Wow. (Who did these studies?)
    • This is a very minimalistic starting point; there isn't even structured stimuli (!)
    • Single neuron (and later, multiple neurons) are purely feed-forward; author cautions that a lack of feedback is not biologically realistic.
      • Also note that this was back in the Motorola 680x0 days ... computers were not that powerful (but certainly could handle more than 1-2 neurons!)
  • Linear algebra shows that Hebbian synapses cause a linear layer to learn the covariance function of their inputs, QQ , with no dependence on the actual layer activity.
  • When looked at in terms of an energy function, this is equivalent to gradient descent to maximize the layer-output variance.
  • He also hits on:
    • Hopfield networks,
    • PCA,
    • Oja's constrained Hebbian rule δw i<L 2(L 1L 2w i)> \delta w_i \propto &lt; L_2(L_1 - L_2 w_i) &gt; (that is, a quadratic constraint on the weight to make Σw 21\Sigma w^2 \sim 1 )
    • Optimal linear reconstruction in the presence of noise
    • Mutual information between layer input and output (I found this to be a bit hand-wavey)
      • Yet he notes critically: "but it is not true that maximum information rate and maximum activity variance coincide when the probability distribution of signals is arbitrary".
        • Indeed. The world is characterized by very non-Gaussian structured sensory stimuli.
    • Redundancy and diversity in 2-neuron coding model.
    • Role of infomax in maximizing the determinant of the weight matrix, sorta.

One may critically challenge the infomax idea: we very much need to (and do) throw away spurious or irrelevant information in our sensory streams; what upper layers 'care about' when making decisions is certainly relevant to the lower layers. This credit-assignment is neatly solved by backprop, and there are a number 'biologically plausible' means of performing it, but both this and infomax are maybe avoiding the problem. What might the upper layers really care about? Likely 'care about' is an emergent property of the interacting local learning rules and network structure. Can you search directly in these domains, within biological limits, and motivated by statistical reality, to find unsupervised-learning networks?

You'll still need a way to rank the networks, hence an objective 'care about' function. Sigh. Either way, I don't per se put a lot of weight in the infomax principle. It could be useful, but is only part of the story. Otherwise Linsker's discussion is accessible, lucid, and prescient.


hide / / print
ref: -0 tags: nonlinear hebbian synaptic learning rules projection pursuit date: 12-12-2019 00:21 gmt revision:4 [3] [2] [1] [0] [head]

PMID-27690349 Nonlinear Hebbian Learning as a Unifying Principle in Receptive Field Formation

  • Here we show that the principle of nonlinear Hebbian learning is sufficient for receptive field development under rather general conditions.
  • The nonlinearity is defined by the neuron’s f-I curve combined with the nonlinearity of the plasticity function. The outcome of such nonlinear learning is equivalent to projection pursuit [18, 19, 20], which focuses on features with non-trivial statistical structure, and therefore links receptive field development to optimality principles.
  • Δwxh(g(w Tx))\Delta w \propto x h(g(w^T x)) where h is the hebbian plasticity term, and g is the neurons f-I curve (input-output relation), and x is the (sensory) input.
  • The relevant property of natural image statistics is that the distribution of features derived from typical localized oriented patterns has high kurtosis [5,6, 39]
  • Model is a generalized leaky integrate and fire neuron, with triplet STDP

hide / / print
ref: -0 tags: LDA myopen linear discriminant analysis classification date: 01-03-2012 02:36 gmt revision:2 [1] [0] [head]

How does LDA (Linear discriminant analysis) work?

It works by projecting data points onto a series of planes, one per class of output, and then deciding based which projection plane is the largest.

Below, to the left is a top-view of this projection with 9 different classes of 2D data each in a different color. Right is a size 3D view of the projection - note the surfaces seem to form a parabola.

Here is the matlab code that computes the LDA (from myopen's ceven

% TrainData and TrainClass are inputs, column major here.
% (observations on columns)
N = size(TrainData,1);
Ptrain = size(TrainData,2);
Ptest = size(TestData,2);

% add a bit of interpolating noise to the data.
sc = std(TrainData(:)); 
TrainData =  TrainData + sc./1000.*randn(size(TrainData));

K = max(TrainClass); % number of classes.

%%-- Compute the means and the pooled covariance matrix --%%
C = zeros(N,N);
for l = 1:K;
	idx = find(TrainClass==l);
		% measure the mean per class
	Mi(:,l) = mean(TrainData(:,idx)')';
		% sum all covariance matrices per class
	C = C + cov((TrainData(:,idx)-Mi(:,l)*ones(1,length(idx)))');

C = C./K; % turn sum into average covariance matrix
Pphi = 1/K;
Cinv = inv(C);

%%-- Compute the LDA weights --%%
for i = 1:K
	Wg(:,i) = Cinv*Mi(:,i);
		% this is the slope of the plane
	Cg(:,i) = -1/2*Mi(:,i)'*Cinv*Mi(:,i) + log(Pphi)';
		% and this, the origin-intersect.

%%-- Compute the decision functions --%%
Atr = TrainData'*Wg + ones(Ptrain,1)*Cg;
	% see - just a simple linear function! 
Ate = TestData'*Wg + ones(Ptest,1)*Cg;

errtr = 0;
AAtr = compet(Atr');
	% this compet function returns a sparse matrix with a 1
	% in the position of the largest element per row. 
	% convert to indices with vec2ind, below. 
TrainPredict = vec2ind(AAtr);
errtr = errtr + sum(sum(abs(AAtr-ind2vec(TrainClass))))/2;
netr = errtr/Ptrain;
PeTrain = 1-netr;

hide / / print
ref: work-0 tags: machine learning manifold detection subspace segregation linearization spectral clustering date: 10-29-2009 05:16 gmt revision:5 [4] [3] [2] [1] [0] [head]

An interesting field in ML is nonlinear dimensionality reduction - data may appear to be in a high-dimensional space, but mostly lies along a nonlinear lower-dimensional subspace or manifold. (Linear subspaces are easily discovered with PCA or SVD(*)). Dimensionality reduction projects high-dimensional data into a low-dimensional space with minimum information loss -> maximal reconstruction accuracy; nonlinear dim reduction does this (surprise!) using nonlinear mappings. These techniques set out to find the manifold(s):

  • Spectral Clustering
  • Locally Linear Embedding
    • related: The manifold ways of perception
      • Would be interesting to run nonlinear dimensionality reduction algorithms on our data! What sort of space does the motor system inhabit? Would it help with prediction? Am quite sure people have looked at Kohonen maps for this purpose.
    • Random irrelevant thought: I haven't been watching TV lately, but when I do, I find it difficult to recognize otherwise recognizable actors. In real life, I find no difficulty recognizing people, even some whom I don't know personally - is this a data thing (little training data), or mapping thing (not enough time training my TV-not-eyes facial recognition).
  • A Global Geometric Framework for Nonlinear Dimensionality Reduction method:
    • map the points into a graph by connecting each point with a certain number of its neighbors or all neighbors within a certain radius.
    • estimate geodesic distances between all points in the graph by finding the shortest graph connection distance
    • use MDS (multidimensional scaling) to embed the original data into a smaller-dimensional euclidean space while preserving as much of the original geometry.
      • Doesn't look like a terribly fast algorithm!

(*) SVD maps into 'concept space', an interesting interpretation as per Leskovec's lecture presentation.

hide / / print
ref: notes-0 tags: linear discriminant analysis LDA EMG date: 07-30-2008 20:56 gmt revision:2 [1] [0] [head]

images/588_1.pdf -- Good lecture on LDA. Below, simple LDA implementation in matlab based on the same:

% data matrix in this case is 36 x 16, 
% with 4 examples of each of 9 classes along the rows, 
% and the axes of the measurement (here the AR coef) 
% along the columns. 
Sw = zeros(16, 16); % within-class scatter covariance matrix. 
means = zeros(9,16); 
for k = 0:8
	m = data(1+k*4:4+k*4, :); % change for different counts / class
	Sw = Sw + cov( m ); % sum the 
	means(k+1, :) = mean( m ); %means of the individual classes
% compute the class-independent transform, 
% e.g. one transform applied to all points
% to project them into one plane. 
Sw = Sw ./ 9; % 9 classes
criterion = inv(Sw) * cov(means); 
[eigvec2, eigval2] = eig(criterion);

See {587} for results on EMG data.

hide / / print
ref: bookmark-0 tags: optimization function search matlab linear nonlinear programming date: 08-09-2007 02:21 gmt revision:0 [head]


very nice collection of links!!

hide / / print
ref: math notes-0 tags: linear_algebra BLAS FFT library programming C++ matrix date: 02-21-2007 15:48 gmt revision:1 [0] [head]

Newmat11 -- nice, elegant BLAS / FFT and matrix library, with plenty of syntactic sugar.

hide / / print
ref: learning-0 tags: motor control primitives nonlinear feedback systems optimization date: 0-0-2007 0:0 revision:0 [head]

http://hardm.ath.cx:88/pdf/Schaal2003_LearningMotor.pdf not in pubmed.

hide / / print
ref: bookmark-0 tags: linear_algebra solution simultaneous_equations GPGPU GPU LUdecomposition clever date: 0-0-2006 0:0 revision:0 [head]