m8ta fun

m8ta
You are not authenticated, login.

text:		sort by
tags:		modified
type:		chronology

{1511}

hide / / print

ref: -2020 tags: evolution neutral drift networks random walk entropy population date: 04-08-2020 00:48 gmt revision:0 [head]

Localization of neutral evolution: selection for mutational robustness and the maximal entropy random walk

The take-away of the paper is that, with larger populations, random mutation and recombination make areas of the graph that take several steps to get to (in the figure, this is Maynard Smith's four-letter mutation word game) are less likely to be visited with a larger population.
This is because the recombination serves to make the population adhere more closely to the 'giant' mode. In Maynard's game, this is 2268 words of 2405 meaningful words that can be reached by successive letter changes.
The author extends it to van Nimwegen's 1999 paper / RNA genotype-secondary structure. It's not as bad as Maynard's game, but still has much lower graph-theoretic entropy than the actual population.

He suggests if the entropic size of the giant component is much smaller than it's dictionary size, then populations are likely to be trapped there.

Interesting, but I'd prefer to have an expert peer-review it first :)

{1423}

hide / / print

ref: -2014 tags: Lillicrap Random feedback alignment weights synaptic learning backprop MNIST date: 02-14-2019 01:02 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-27824044 Random synaptic feedback weights support error backpropagation for deep learning.

"Here we present a surprisingly simple algorithm for deep learning, which assigns blame by multiplying error signals by a random synaptic weights.
Backprop multiplies error signals e by the weight matrix $W^T$ , the transpose of the forward synaptic weights.
But the feedback weights do not need to be exactly $W^T$ ; any matrix B will suffice, so long as on average:
$e^T W B e > 0$

Meaning that the teaching signal $B e$ lies within 90deg of the signal used by backprop, $W^T e$

Feedback alignment actually seems to work better than backprop in some cases. This relies on starting the weights very small (can't be zero -- no output)

Our proof says that weights W0 and W
evolve to equilibrium manifolds, but simulations (Fig. 4) and analytic results (Supple-
mentary Proof 2) hint at something more specific: that when the weights begin near
0, feedback alignment encourages W to act like a local pseudoinverse of B around
the error manifold. This fact is important because if B were exactly W + (the Moore-
Penrose pseudoinverse of W ), then the network would be performing Gauss-Newton
optimization (Supplementary Proof 3). We call this update rule for the hidden units
pseudobackprop and denote it by âˆ†hPBP = W + e. Experiments with the linear net-
work show that the angle, âˆ†hFA ]âˆ†hPBP quickly becomes smaller than âˆ†hFA ]âˆ†hBP
(Fig. 4b, c; see Methods). In other words feedback alignment, despite its simplicity,
displays elements of second-order learning.

{806}

hide / / print

ref: work-0 tags: gaussian random variables mutual information SNR date: 01-16-2012 03:54 gmt revision:26 [25] [24] [23] [22] [21] [20] [head]

I've recently tried to determine the bit-rate of conveyed by one gaussian random process about another in terms of the signal-to-noise ratio between the two. Assume $x$ is the known signal to be predicted, and $y$ is the prediction.

Let's define $SNR(y) = \frac{Var(x)}{Var(err)}$ where $err = x-y$ . Note this is a ratio of powers; for the conventional SNR, $SNR_{dB} = 10*log_{10 } \frac{Var(x)}{Var(err)}$ . $Var(err)$ is also known as the mean-squared-error (mse).

Now, $Var(err) = \sum{ (x - y - sstrch \bar{err})^2 estrch} = Var(x) + Var(y) - 2 Cov(x,y)$ ; assume x and y have unit variance (or scale them so that they do), then

$\frac{2 - SNR(y)^{-1}}{2 } = Cov(x,y)$

We need the covariance because the mutual information between two jointly Gaussian zero-mean variables can be defined in terms of their covariance matrix: (see http://www.springerlink.com/content/v026617150753x6q/ ). Here Q is the covariance matrix,

$Q = \left[ \array{Var(x) & Cov(x,y) \\ Cov(x,y) & Var(y)} \right]$

$MI = \frac{1 }{2 } log \frac{Var(x) Var(y)}{det(Q)}$

$Det(Q) = 1 - Cov(x,y)^2$

Then $MI = - \frac{1 }{2 } log_2 \left[ 1 - Cov(x,y)^2 \right]$

or $MI = - \frac{1 }{2 } log_2 \left[ SNR(y)^{-1} - \frac{1 }{4 } SNR(y)^{-2} \right]$

This agrees with intuition. If we have a SNR of 10db, or 10 (power ratio), then we would expect to be able to break a random variable into about 10 different categories or bins (recall stdev is the sqrt of the variance), with the probability of the variable being in the estimated bin to be 1/2. (This, at least in my mind, is where the 1/2 constant comes from - if there is gaussian noise, you won't be able to determine exactly which bin the random variable is in, hence log_2 is an overestimator.)

Here is a table with the respective values, including the amplitude (not power) ratio representations of SNR. "

SNR	Amp. ratio	MI (bits)
10	3.1	1.6
20	10	3.3
30	31	5.0
40	100	6.6
90	31e3	15

Note that at 90dB, you get about 15 bits of resolution. This makes sense, as 16-bit DACs and ADCs have (typically) 96dB SNR. good.

Now, to get the bitrate, you take the SNR, calculate the mutual information, and multiply it by the bandwidth (not the sampling rate in a discrete time system) of the signals. In our particular application, I think the bandwidth is between 1 and 2 Hz, hence we're getting 1.6-3.2 bits/second/axis, hence 3.2-6.4 bits/second for our normal 2D tasks. If you read this blog regularly, you'll notice that others have achieved 4bits/sec with one neuron and 6.5 bits/sec with dozens {271}.

{984}

hide / / print

ref: ODoherty-2011 tags: Odoherty Nicolelis ICMS stimulation randomly patterned gamma distribution date: 01-03-2012 06:55 gmt revision:1 [0] [head]

IEEE-6114258 (pdf) Towards a Brain-Machine-Brain Interface:Virtual Active Touch Using Randomly Patterned Intracortical Microstimulation.

Key result: monkeys can discriminate between constant-frequency ICMS and aperiodic pulses, hence can discriminate some fine temporal aspects of ICMS.
Also discussed blanking methods for stimulating and recording at the same time (on different electrodes, using the randomized stimulation patterns).

____References____

O'Doherty, J. and Lebedev, M. and Li, Z. and Nicolelis, M. Towards a Brain #x2013;Machine #x2013;Brain Interface:Virtual Active Touch Using Randomly Patterned Intracortical Microstimulation Neural Systems and Rehabilitation Engineering, IEEE Transactions on PP 99 1 (2011)