Efficient event-based delay learning in spiking neural networks

Efficient event-based delay learning in spiking neural networks

Theory

Learning weights in networks with delay

We start by defining our two differential equations in the implicit form for the membrane potentials and input currents, respectively.

$$_\equiv _{{}}\dot{ }+{{ }}-{{{\bf{I}}}}=0$$

(7)

$${{{{\bf{f}}}}}_{I}\equiv {\tau }_{{{{\rm{s}}}}}\dot{{{{\bf{I}}}}}+{{{\bf{I}}}}=0$$

(8)

In the following, we will assume that all event times \({{{\mathscr{E}}}}\) are distinct, both in terms of spikes occurring and of spikes arriving. In continuous time, this is not unlikely, but also, as argued in ref. 18, the equations do not break down if spikes occur or arrive at the same time. Then,

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{w}_{ji}}=\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}{w}_{ji}}\left[{l}_{p}({{{\mathscr{S}}}})+{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\left[{l}_{V}({{{\bf{V}}}},t)+{{{{\boldsymbol{\lambda }}}}}_{V}\cdot {{{{\bf{f}}}}}_{V}+{{{{\boldsymbol{\lambda }}}}}_{I}\cdot {{{{\bf{f}}}}}_{I}\right]{{{\rm{d}}}}t\right]$$

(9)

where we have added the product of adjoint variables and dynamics functions to the loss function as the adjoint method dictates. This is possible because for solutions of the forward dynamics, fV and fI are identically zero at all times. Using

$$\frac{\partial {{{{\bf{f}}}}}_{V}}{\partial {w}_{ji}}={\tau }_{{{{\rm{m}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}+\frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}-\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}$$

(10)

$$\frac{\partial {{{{\bf{f}}}}}_{I}}{\partial {w}_{ji}}={\tau }_{{{{\rm{s}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}+\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}},$$

(11)

we can apply the derivative on the right-hand side of (9) to obtain

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{w}_{ji}}= {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}\\ +{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\left[\frac{\partial {l}_{V}}{\partial {{{\bf{V}}}}}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}+{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \left({\tau }_{{{{\rm{m}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}+\frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}-\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}\right)\right.\\ \left.+{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \left({\tau }_{{{{\rm{s}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}+\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}\right)\right]{{{\rm{d}}}}t\\ +{l}_{V,k+1}^{-}\frac{{{{\rm{d}}}}{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}-{l}_{V,k}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}$$

(12)

Using partial integration, we can rewrite

$$\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}{{{\rm{d}}}}t=-\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{V}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}{{{\rm{d}}}}t+{\left[{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}$$

(13)

and

$$\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}{{{\rm{d}}}}t=-\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{I}\cdot \frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}{{{\rm{d}}}}t+{\left[{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}.$$

(14)

Inserting this into (12), we get

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{w}_{ji}}= {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}\\ +{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\left[\left(\frac{\partial {l}_{V}}{\partial {{{\bf{V}}}}}-{\tau }_{{{{\rm{m}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{V}+{{{{\boldsymbol{\lambda }}}}}_{V}\right)\cdot \frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}+(-{\tau }_{{{{\rm{s}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{I}+{{{{\boldsymbol{\lambda }}}}}_{I}-{{{{\boldsymbol{\lambda }}}}}_{V})\cdot \frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}\right]{{{\rm{d}}}}t\\ +{\tau }_{{{{\rm{m}}}}}{\left[{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}+{\tau }_{{{{\rm{s}}}}}{\left[{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\\ +{l}_{V,k+1}^{-}\frac{{{{\rm{d}}}}{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}-{l}_{V,k}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}$$

(15)

where the last two terms arise from the derivative of the bounds of the integral in the Leibniz rule. We now define the backwards dynamics of the adjoint variables as usual18,

$${\tau }_{{{{\rm{m}}}}}{{{{\boldsymbol{\lambda }}}}}_{V}^{{\prime} }=-{{{{\boldsymbol{\lambda }}}}}_{V}-\frac{\partial {l}_{V}}{\partial {{{\bf{V}}}}}$$

(16)

$${\tau }_{{{{\rm{s}}}}}{{{{\boldsymbol{\lambda }}}}}_{I}^{{\prime} }=-{{{{\boldsymbol{\lambda }}}}}_{I}+{{{{\boldsymbol{\lambda }}}}}_{V}$$

(17)

which cancels the terms containing \(\frac{\partial {{{\bf{V}}}}}{\partial {w}_{ji}}\) and \(\frac{\partial {{{\bf{I}}}}}{\partial {w}_{ji}}\), so that we get

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{w}_{ji}} = {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}+{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\left({l}_{V,k}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}-{l}_{V,k}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}\right.\\ +\left.{\left.\left[{\tau }_{{{{\rm{m}}}}}\left({{{{\boldsymbol{\lambda }}}}}_{V}^{-}\cdot \frac{\partial {{{{\bf{V}}}}}^{-}}{\partial {w}_{ji}}-{{{{\boldsymbol{\lambda }}}}}_{V}^{+}\cdot \frac{\partial {{{{\bf{V}}}}}^{+}}{\partial {w}_{ji}}\right)+{\tau }_{{{{\rm{s}}}}}\left({{{{\boldsymbol{\lambda }}}}}_{I}^{-}\cdot \frac{\partial {{{{\bf{I}}}}}^{-}}{\partial {w}_{ji}}-{{{{\boldsymbol{\lambda }}}}}_{I}^{+}\cdot \frac{\partial {{{{\bf{I}}}}}^{+}}{\partial {w}_{ji}}\right)\right]\right\vert }_{{t}_{k}^{{{{\rm{event}}}}}}\right)$$

(18)

The sum over events in \({{{\mathscr{E}}}}\) extends over spike emission times \({t}_{k}^{{{{\rm{spike}}}}}\) and spike arrival times. We first focus on the spike emission times \({t}_{k}^{{{{\rm{spike}}}}}\). Before the jump at \({t}_{k}^{{{{\rm{spike}}}}}\) we have,

$${V}_{n(k)}^{-}-\vartheta=0,$$

(19)

where n(k) denotes the spiking neuron at event k. If we take the derivative of this equation, we get, using the chain rule,

$$\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}}+{\dot{V}}_{n(k)}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}=0$$

(20)

$$\Rightarrow \quad \frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}=-\frac{1}{{\dot{V}}_{n(k)}^{-}}\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}},$$

(21)

and after the jump,

$${V}_{n(k)}^{+}=0$$

(22)

$$\Rightarrow \quad \frac{\partial {V}_{n(k)}^{+}}{\partial {w}_{ji}}+{\dot{V}}_{n(k)}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}=0\,.$$

(23)

Inserting (21) into (23) we obtain as usual18

$$\frac{\partial {V}_{n(k)}^{+}}{\partial {w}_{ji}}=\frac{{\dot{V}}_{n(k)}^{+}}{{\dot{V}}_{n(k)}^{-}}\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}}.$$

(24)

For the current In(k), there is no jump at \({t}_{k}^{{{{\rm{spike}}}}}\), and also not in its derivative: \({I}_{n(k)}^{+}={I}_{n(k)}^{-}\) and \({\dot{I}}_{n(k)}^{+}={\dot{I}}_{n(k)}^{-}\) implies

$$\frac{\partial {I}_{n(k)}^{+}}{\partial {w}_{ji}}=\frac{\partial {I}_{n(k)}^{-}}{\partial {w}_{ji}}.$$

(25)

Let us now consider what happens at the spike arrival times, when the spike k at \({t}_{k}^{{{{\rm{spike}}}}}\) is received at all the postsynaptic neurons m at times \({t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}\) (i.e. we look at \({{{\mathscr{E}}}}\setminus {{{\mathscr{S}}}}\)). Note that this is where EventProp with delays becomes substantially different from standard EventProp, where spike emission and arrival times are the same. At spike arrival, the input current of the receiving neurons jumps,

$${I}_{m}^{+}={I}_{m}^{-}+{w}_{mn(k)}.$$

(26)

By taking the derivative with respect to wji, we get

$$\frac{\partial {I}_{m}^{+}}{\partial {w}_{ji}}+{\dot{I}}_{m}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}=\frac{\partial {I}_{m}^{-}}{\partial {w}_{ji}}+{\dot{I}}_{m}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}+{\delta }_{in(k)}{\delta }_{jm},$$

(27)

where we have used that \(\frac{{{{\rm{d}}}}({t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)})}{{{{\rm{d}}}}{w}_{ji}}=\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}\). Now, using the dynamics equations for I, we also have

$${\tau }_{{{{\rm{s}}}}}{\dot{I}}_{m}^{+}={\tau }_{{{{\rm{s}}}}}{\dot{I}}_{m}^{-}-{w}_{mn(k)},$$

(28)

and hence,

$$\frac{\partial {I}_{m}^{+}}{\partial {w}_{ji}}= \frac{\partial {I}_{m}^{-}}{\partial {w}_{ji}}+{\tau }_{{{{\rm{s}}}}}^{-1}{w}_{mn(k)}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}+{\delta }_{in(k)}{\delta }_{jm}\\ = \frac{\partial {I}_{m}^{-}}{\partial {w}_{ji}}+{\left.\left[\frac{1}{{\tau }_{{{{\rm{s}}}}}{\dot{V}}_{n(k)}^{-}}{w}_{mn(k)}\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}+{\delta }_{in(k)}{\delta }_{jm}$$

(29)

where we have used (21) to replace \(\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}\). Since we have \({V}_{m}^{+}={V}_{m}^{-}\) for non-spiking neurons,

$$\frac{\partial {V}_{m}^{+}}{\partial {w}_{ji}}+{\dot{V}}_{m}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}=\frac{\partial {V}_{m}^{-}}{\partial {w}_{ji}}+{\dot{V}}_{m}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{w}_{ji}}.$$

(30)

From Eq. (26) and the dynamics equations for V we know

$${\tau }_{{{{\rm{m}}}}}{\dot{V}}_{m}^{+}={\tau }_{{{{\rm{m}}}}}{\dot{V}}_{m}^{-}+{w}_{mn(k)}.$$

(31)

Putting this together, we get

$$\frac{\partial {V}_{m}^{+}}{\partial {w}_{ji}}=\frac{\partial {V}_{m}^{-}}{\partial {w}_{ji}}-{\tau }_{{{{\rm{m}}}}}^{-1}{w}_{mn(k)}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{w}_{ji}}$$

(32)

$$=\frac{\partial {V}_{m}^{-}}{\partial {w}_{ji}}+{\left.\left[\frac{1}{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}{w}_{mn(k)}\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}$$

(33)

We now can insert the expressions (21), (25), (24) and (33) into (18) and reorder terms according to which spike the jumps originate from, we get

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{w}_{ji}} = {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\left[\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}}\left[{\tau }_{{{{\rm{m}}}}}\left({\lambda }_{V,n(k)}^{-}-\frac{{\dot{V}}_{n(k)}^{+}}{{\dot{V}}_{n(k)}^{-}}{\lambda }_{V,n(k)}^{+}\right)+\frac{1}{{\dot{V}}_{n(k)}^{-}}\left(-\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}+{l}_{V}^{+}-{l}_{V}^{-}\right)\right]\right.\\ {\left.\left.+{\tau }_{{{{\rm{s}}}}}({\lambda }_{I,n(k)}^{-}-{\lambda }_{I,n(k)}^{+})\frac{\partial {I}_{n(k)}^{-}}{\partial {w}_{ji}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}\\ +{\sum}_{m}\left[{\tau }_{{{{\rm{m}}}}}({\lambda }_{V,m}^{-}-{\lambda }_{V,m}^{+})\frac{\partial {V}_{m}^{-}}{\partial {w}_{ji}}+{\tau }_{{{{\rm{s}}}}}({\lambda }_{I,m}^{-}-{\lambda }_{I,m}^{+})\frac{\partial {I}_{m}^{-}}{\partial {w}_{ji}}\right]{| }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}\\ +{\left.\left[\frac{\partial {V}_{n(k)}^{-}}{\partial {w}_{ji}}\frac{1}{{\dot{V}}_{n(k)}^{-}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}{\left.\left[{w}_{mn(k)}({\lambda }_{I,m}^{+}-{\lambda }_{V,m}^{+})\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}-{\left.\left[{\tau }_{{{{\rm{s}}}}}{\delta }_{in(k)}{\delta }_{jm}{\lambda }_{I,m}^{+}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}.$$

(34)

Interestingly, after this detailed work, we find that the update of λV of the spiking neuron is the same as without delays, apart from taking the receiving neurons’ corresponding λV and λI at the delayed time.

$${\lambda }_{V,n(k)}^{-} = {\left.\left[\frac{{\dot{V}}_{n(k)}^{+}}{{\dot{V}}_{n(k)}^{-}}{\lambda }_{V,n(k)}^{+}+\frac{1}{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}\left(\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}+{l}_{V}^{-}-{l}_{V}^{+}\right)\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}\\ +{\left.\left[\frac{1}{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}{\sum}_{m}{w}_{mn(k)}{\left.\left[({\lambda }_{V,m}^{+}-{\lambda }_{I,m}^{+})\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}$$

(35)

$${\lambda }_{V,m}^{-}={\lambda }_{V,m}^{+},\,{{\mbox{if}}}\,\,m\ne n(k)$$

(36)

$${{{{\boldsymbol{\lambda }}}}}_{I}^{-}={{{{\boldsymbol{\lambda }}}}}_{I}^{+}.$$

(37)

The gradient is then given by

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{w}_{ji}}=-{\tau }_{{{{\rm{s}}}}}{\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}{\delta }_{in(k)}{\lambda }_{I,j}{| }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{jn(k)}}=-{\tau }_{{{{\rm{s}}}}}{\sum}_{\left\{{t}_{k}^{{{{\rm{spike}}}}}\,| n(k)=i\right\}}{\lambda }_{I,j}{| }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{ji}}.$$

(38)

Learning delays

In the following, we will derive the gradients for delays dji similarly to our weight gradient derivations. We start again with the standard approach for the adjoint method,

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{d}_{ji}}=\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}{d}_{ji}}\left[{l}_{p}({{{\mathscr{S}}}})+{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\left[{l}_{V}({{{\bf{V}}}},t)+{{{{\boldsymbol{\lambda }}}}}_{V}\cdot {{{{\bf{f}}}}}_{V}+{{{{\boldsymbol{\lambda }}}}}_{I}\cdot {{{{\bf{f}}}}}_{I}\right]{{{\rm{d}}}}t\right]$$

(39)

$$\frac{\partial {{{{\bf{f}}}}}_{V}}{\partial {d}_{ji}}={\tau }_{{{{\rm{m}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}+\frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}-\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}$$

(40)

$$\frac{\partial {{{{\bf{f}}}}}_{I}}{\partial {d}_{ji}}={\tau }_{{{{\rm{s}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}+\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}.$$

(41)

Therefore,

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{d}_{ji}}= {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{d}_{ji}}\\ +{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\left[\frac{\partial {l}_{V}}{\partial {{{\bf{V}}}}}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}+{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \left({\tau }_{{{{\rm{m}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}+\frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}-\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}\right)\right.\\ \left.+{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \left({\tau }_{{{{\rm{s}}}}}\frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}+\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}\right)\right]{{{\rm{d}}}}t\\ +{l}_{V,k+1}^{-}\frac{{{{\rm{d}}}}{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}-{l}_{V,k}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}.$$

(42)

Then, using partial integration,

$$\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}{{{\rm{d}}}}t=-\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{{{{\bf{V}}}}}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}{{{\rm{d}}}}t+{\left[{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}$$

(43)

$$\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \frac{{{{\rm{d}}}}}{{{{\rm{d}}}}t}\frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}{{{\rm{d}}}}t=-\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{I}\cdot \frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}{{{\rm{d}}}}t+{\left[{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}$$

(44)

and hence,

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{d}_{ji}}= {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in S}\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{d}_{ji}}\\ {\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}\left[\int_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}\left(\frac{\partial {l}_{V}}{\partial {{{\bf{V}}}}}-{\tau }_{{{{\rm{m}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{V}+{{{{\boldsymbol{\lambda }}}}}_{V}\right)\cdot \frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}+(-{\tau }_{{{{\rm{s}}}}}{\dot{{{{\boldsymbol{\lambda }}}}}}_{I}+{{{{\boldsymbol{\lambda }}}}}_{I}-{{{{\boldsymbol{\lambda }}}}}_{V})\cdot \frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}\right]{{{\rm{d}}}}t\\ +{\tau }_{{{{\rm{m}}}}}{\left[{{{{\boldsymbol{\lambda }}}}}_{V}\cdot \frac{\partial {{{\bf{V}}}}}{\partial {d}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}+{\tau }_{{{{\rm{s}}}}}{\left[{{{{\boldsymbol{\lambda }}}}}_{I}\cdot \frac{\partial {{{\bf{I}}}}}{\partial {d}_{ji}}\right]}_{{t}_{k}^{{{{\rm{event}}}}}}^{{t}_{k+1}^{{{{\rm{event}}}}}}+{l}_{V,k+1}^{-}\frac{{{{\rm{d}}}}{t}_{k+1}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}-{l}_{V,k}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}.$$

(45)

If we now define the adjoint dynamics as usual, the terms in the integral disappear, and we are left with

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{d}_{ji}}= {\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{d}_{ji}}\\ +{\sum}_{{t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}}{l}_{V,k}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}-{l}_{V,k}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}} \\ +{\left.\left[{\tau }_{{{{\rm{m}}}}}\left({{{{\boldsymbol{\lambda }}}}}_{V}^{-}\cdot \frac{\partial {{{{\bf{V}}}}}^{-}}{\partial {d}_{ji}}-{{{{\boldsymbol{\lambda }}}}}_{V}^{+}\cdot \frac{\partial {{{{\bf{V}}}}}^{+}}{\partial {d}_{ji}}\right)+{\tau }_{{{{\rm{s}}}}}\left({{{{\boldsymbol{\lambda }}}}}_{I}^{-}\cdot \frac{\partial {{{{\bf{I}}}}}^{-}}{\partial {d}_{ji}}-{{{{\boldsymbol{\lambda }}}}}_{I}^{+}\cdot \frac{\partial {{{{\bf{I}}}}}^{+}}{\partial {d}_{ji}}\right)\right]\right\vert }_{{t}_{k}^{{{{\rm{event}}}}}}.$$

(46)

Let’s now again first consider the spike emission times \({t}_{k}^{{{{\rm{spike}}}}}\) and the spiking neuron n(k). Before the jump:

$$\frac{\partial {V}_{n(k)}^{-}}{\partial {d}_{ji}}+{\dot{V}}_{n(k)}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{d}_{ji}}=0$$

(47)

$$\Rightarrow \quad \frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{d}_{ji}}=-\frac{1}{{\dot{V}}_{n(k)}^{-}}\frac{\partial {V}_{n(k)}^{-}}{\partial {d}_{ji}},$$

(48)

and after the jump:

$$\frac{\partial {V}_{n(k)}^{+}}{\partial {d}_{ji}}+{\dot{V}}_{n(k)}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{spike}}}}}}{{{{\rm{d}}}}{d}_{ji}}=0$$

(49)

$$\Rightarrow \quad \frac{\partial {V}_{n(k)}^{+}}{\partial {d}_{ji}}=\frac{{\dot{V}}_{n(k)}^{+}}{{\dot{V}}_{n(k)}^{-}}\frac{\partial {V}_{n(k)}^{-}}{\partial {d}_{ji}}.$$

(50)

There is no jump in In(k) or its time derivative at \({t}_{k}^{{{{\rm{spike}}}}}\) which analogous to above implies

$$\frac{\partial {I}_{n(k)}^{+}}{\partial {d}_{ji}}=\frac{\partial {I}_{n(k)}^{-}}{\partial {d}_{ji}}.$$

(51)

Turning to spike arrival times \({t}_{k}^{{{{\rm{event}}}}}\in {{{\mathscr{E}}}}\backslash {{{\mathscr{S}}}}\), when the spike at \({t}_{k}^{{{{\rm{spike}}}}}\) arrives at the post-synaptic neurons m, we get

$${I}_{m}^{+}={I}_{m}^{-}+{w}_{mn(k)},$$

(52)

and hence,

$$\frac{\partial {I}_{m}^{+}}{\partial {d}_{ji}}+{\dot{I}}_{m}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}=\frac{\partial {I}_{m}^{-}}{\partial {d}_{ji}}+{\dot{I}}_{m}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}.$$

(53)

Using the dynamics of I, (52) implies

$${\tau }_{{{{\rm{s}}}}}{\dot{I}}_{m}^{+}={\tau }_{{{{\rm{s}}}}}{\dot{I}}_{m}^{-}-{w}_{mn(k)},$$

(54)

and hence

$$\frac{\partial {I}_{m}^{+}}{\partial {d}_{ji}}=\frac{\partial {I}_{m}^{-}}{\partial {d}_{ji}}+{\tau }_{{{{\rm{s}}}}}^{-1}{w}_{mn(k)}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}$$

(55)

$$=\frac{\partial {I}_{m}^{-}}{\partial {d}_{ji}}-\frac{1}{{\tau }_{{{{\rm{s}}}}}{\dot{V}}_{n(k)}^{-}}{w}_{mn(k)}\frac{\partial {V}_{n(k)}^{-}}{\partial {d}_{ji}}+{\delta }_{in(k)}{\delta }_{jm}\frac{{w}_{mn(k)}}{{\tau }_{{{{\rm{s}}}}}},$$

(56)

where the term involving the spiking neuron n(k) stems from the derivative of the spike time \({t}_{k}^{{{{\rm{event}}}}}\) with respect to dji using (48) and the last term from the derivative of the delay by itself (since \(\frac{\partial {t}_{k}^{{{{\rm{event}}}}}}{\partial {d}_{ji}}=\frac{\partial ({t}_{k}^{{{{\rm{spike}}}}}+{d}_{ji})}{\partial {d}_{ji}}=\frac{\partial {t}_{k}^{{{{\rm{spike}}}}}}{\partial {d}_{ji}}+1\)). Note that this is where the derivations begin to differ from when we were taking the derivative with respect to wji. For the voltages,

$$\frac{\partial {V}_{m}^{+}}{\partial {d}_{ji}}+{\dot{V}}_{m}^{+}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}=\frac{\partial {V}_{m}^{-}}{\partial {d}_{ji}}+{\dot{V}}_{m}^{-}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}},$$

(57)

and using the dynamics of V and (52),

$${\tau }_{{{{\rm{m}}}}}{\dot{V}}_{m}^{+}={\tau }_{{{{\rm{m}}}}}{\dot{V}}_{m}^{-}+{w}_{mn(k)},$$

(58)

which put together gives

$$\frac{\partial {V}_{m}^{+}}{\partial {d}_{ji}}=\frac{\partial {V}_{m}^{-}}{\partial {d}_{ji}}-{\tau }_{{{{\rm{m}}}}}^{-1}{w}_{mn}\frac{{{{\rm{d}}}}{t}_{k}^{{{{\rm{event}}}}}}{{{{\rm{d}}}}{d}_{ji}}$$

(59)

$$=\frac{\partial {V}_{m}^{-}}{\partial {d}_{ji}}+\frac{1}{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}{w}_{mn(k)}\frac{\partial {V}_{n(k)}^{-}}{\partial {d}_{ji}}-{\delta }_{in(k)}{\delta }_{jm}\frac{{w}_{mn(k)}}{{\tau }_{{{{\rm{m}}}}}},$$

(60)

where the last term again arises from the derivative of the delay dmn(k) in \({t}_{k}^{{{{\rm{event}}}}}\) with respect to dji. Taking everything together, we get

$$\frac{d{{{\mathscr{L}}}}}{d{d}_{ji}}={\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}\left[\frac{\partial {V}_{n(k)}^{-}}{\partial {d}_{ji}}\left[{\tau }_{{{{\rm{m}}}}}\left({\lambda }_{V,n(k)}^{-}-\frac{{\dot{V}}_{n(k)}^{+}}{{\dot{V}}_{n(k)}^{-}}{\lambda }_{V,n(k)}^{+}\right)+\frac{1}{{\dot{V}}_{n(k)}^{-}}\left(-\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}+{l}_{V}^{+}-{l}_{V}^{-}\right)\right]\right.$$

(61)

$${\left.\left.+{\tau }_{{{{\rm{s}}}}}({\lambda }_{I,n(k)}^{-}-{\lambda }_{I,n(k)}^{+})\frac{\partial {I}_{n}^{-}}{\partial {d}_{ji}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}$$

(62)

$$+{\sum}_{m}{\left.\left[{\tau }_{{{{\rm{m}}}}}({\lambda }_{V,m}^{-}-{\lambda }_{V,m}^{+})\frac{\partial {V}_{m}^{-}}{\partial {d}_{ji}}+{\tau }_{{{{\rm{s}}}}}({\lambda }_{I,m}^{-}-{\lambda }_{I,m}^{+})\frac{\partial {I}_{m}^{-}}{\partial {d}_{ji}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}$$

(63)

$$+{\left.\left[\frac{\partial {V}_{n}^{-}}{\partial {d}_{ji}}\frac{1}{{\dot{V}}_{n(k)}^{-}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}{\left.\left[{w}_{mn(k)}({\lambda }_{I,m}^{+}-{\lambda }_{V,m}^{+})\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}$$

(64)

$$-{\left.\left[{w}_{mn(k)}{\delta }_{in(k)}{\delta }_{jm}({\lambda }_{I,m}^{+}-{\lambda }_{V,m}^{+})\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}.$$

(65)

So, using the usual trick

$$\frac{{\dot{V}}_{n(k)}^{+}}{{\dot{V}}_{n(k)}^{-}}=\frac{\vartheta }{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}+1,$$

(66)

we again arrive at the same jump conditions as usual,

$$\begin{array}{rcl}{\lambda }_{V,n(k)}^{-}&=&{\left.\left[{\lambda }_{V,n(k)}^{+}+\frac{1}{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}\left[\vartheta \cdot {\lambda }_{V,n(k)}^{+}+\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}+{l}_{V}^{-}-{l}_{V}^{+}\right]\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}\\ &&+{\left.\left[\frac{1}{{\tau }_{{{{\rm{m}}}}}{\dot{V}}_{n(k)}^{-}}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}}{\sum}_{m}{w}_{mn(k)}{\left.\left[{\lambda }_{V,m}^{+}-{\lambda }_{I,m}^{+}\right]\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}\end{array}$$

(67)

$${\lambda }_{V,m}^{-}={\lambda }_{V,m}^{+},\,{{\mbox{if}}}\,\,m\ne n(k)$$

(68)

$${{{{\boldsymbol{\lambda }}}}}_{I}^{-}={{{{\boldsymbol{\lambda }}}}}_{I}^{+},$$

(69)

but the gradient updates take the form

$$\frac{{{{\rm{d}}}}{{{\mathscr{L}}}}}{{{{\rm{d}}}}{d}_{ji}}=-{\sum}_{{t}_{k}^{{{{\rm{spike}}}}}\in {{{\mathscr{S}}}}}{w}_{ji}{\delta }_{in(k)}{\left.({\lambda }_{I,j}-{\lambda }_{V,j})\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{jn(k)}} \\=-{w}_{ji}{\sum}_{\left\{{t}_{k}^{{{{\rm{spike}}}}}\,| n(k)=i\right\}}{\left.({\lambda }_{I,j}-{\lambda }_{V,j})\right\vert }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{ji}}.$$

(70)

Time-invariant mean squared error loss

Following Göltz et al.39, we use the time-invariant mean squared error loss of output spike times for the Yin-Yang benchmark

$${{{{\mathscr{L}}}}}_{\Delta {{{\rm{MSE}}}}}=\frac{1}{2}{\sum}_{i\ne c}{\left({t}_{i}-{t}_{c}-{\Delta }_{t}\right)}^{2},$$

(71)

where c is the true class of the current input and ti, tc denote the first spike time in the respective output neurons. In the EventProp formalism, this is a spike-time dependent loss lp and, therefore, drives jumps in λV,i in output neuron i at spike times \({t}_{k}^{{{{\rm{spike}}}}}\) in the backward pass (see Table 1) by

$$\frac{\partial {l}_{p}}{\partial {t}_{k}^{{{{\rm{spike}}}}}}=\left\{\begin{array}{ll}({t}_{i}-{t}_{c}-{\Delta }_{t})&{{{\rm{if}}}}\,n(k)=i,{t}_{k}^{{{{\rm{spike}}}}}={t}_{i},i\ne c\\ {\sum}_{i\ne c}-({t}_{i}-{t}_{c}-{\Delta }_{t})&{{{\rm{if}}}}\,n(k)=c,{t}_{k}^{{{{\rm{spike}}}}}={t}_{c}\\ 0&\,{\mbox{otherwise}}\,\end{array}\right.$$

(72)

Implementation

We implemented all of our work in the mlGeNN framework40,41 to exploit the the efficiency of event-based learning. In all of our experiments, we used the parameters from previous EventProp work21, apart from spike regularisation strengths, number of hidden layers and recurrent connections. We did not implement heterogeneous and trainable time constants, so that the independent effect of delays would be more clear. For our experiments on the SHD and SSC datasets, we adopted the data augmentation approaches described by Nowotny, Turner, and Knight21, which were designed to improve generalization. Specifically, we implemented the following augmentations:

  • Input Shifting: We randomly shifted all inputs by a value within the range of (−40, 40).

  • Input Blending: We blended two inputs from the same class by aligning their centres of mass and randomly selecting spikes from each input with a probability of 0.5.

For SSC we only used the shift augmentation. For the Yin-Yang dataset we decreased the learning rate on both weights and delays at the end of each epoch. On SHD and SSC, we implemented an “ease-in” scheduler on the weight learning rate, starting from 0.001 times the learning rate, increasing it at the end of each batch, until it reached the final value. For our chosen hyperparameters, see Tables 2–5. GeNN already provided an efficient implementation of spike transmission with per-synapse delays43 – allowing the EventProp forward pass to be implemented efficiently. However, the λV transitions in the backward pass require access to postsynaptic λ values with a per-synapse delay (\(\left[{\lambda }_{V,m}^{+}-{\lambda }_{I,m}^{+}\right]{| }_{{t}_{k}^{{{{\rm{spike}}}}}+{d}_{mn(k)}}\) from Equation: (35)). This required a small extension to GeNN’s existing system for providing delayed access to postsynaptic variables from a synapse model42 in order to enable it to use the per-synapse delays used for spike transmission in the forward pass.

Table 2 Yin-Yang parameters
Table 5 Braille reading parameters

link