A blueprint for precise and fault-tolerant analog neural networks

Table of Contents

Handling negative numbers with RNS

An RNS with a dynamic range of M allows representing values within the range of [0, M). This range can be shifted to [ − ψ, ψ], where ψ = ⌊(M − 1)/2⌋, to represent negative values. This is achieved by reassigning the values in between (0, ψ] to be positive, 0 to be zero, and the numbers in between (ψ, 2ψ] to be negative (i.e., [ − ψ, 0)). Then, the values can be recovered uniquely by using CRT with a slight modification:

$$A=\left\{\beginarrayll\mathop\sum _i=1^n| a_iM_iT_i_M,\quad &{{\rmif}}\sum _i=1^n| a_iM_iT_i _M\le \psi \\ \sum _i=1^n| a_iM_iT_i _M-M,\quad &{{\rmotherwise}}.\endarray\right.$$

(9)

Data converter energy estimation

The DAC and ADC energy numbers in Fig. 6a, b are estimated by using equations formulated by Murmann^19,42. The energy consumption of a DAC per b-bit conversion is

$$E_{{{{{\rmDAC}}}}}=b^2C_uV_{{{{{\rmDD}}}}}^2,$$

(10)

where C_u = 0.5 fF is a typical unit capacitance and V_DD = 1 V is the supply voltage¹⁹. The energy consumption of an ADC per b-bit conversion can be estimated as

$$E_{{{{{\rmADC}}}}}=k_1b+k_24^b.$$

(11)

For calculating the coefficients k₁ and k₂, we used the data from the ADC survey collected by Murmann⁴². The dataset includes all ADC literature published in the two main venues of the field, the International Solid-State Circuits Conference (ISSCC) and the VLSI Circuit Symposium, between the years 1997 and 2023. We removed the data points with a sampling frequency lower than 1 GHz as our design requires high-speed data converters. k₁ is calculated as the average of the three samples with the smallest E_ADC/b and k₂ as the average of the three samples with the smallest E_ADC/4^b among the available data points⁴².

Accuracy modeling

Both RNS-based and regular fixed-point analog cores are modeled using PyTorch for estimating inference and training accuracy. Convolution, linear, and batched matrix multiplication (BMM) layers are performed as GEMM operations which are computed tile-by-tile as a set of tiled-MVM operations, given the tile size of the analog core. Each input, weight, and output tiles are quantized according to the desired bit precision.

Before quantization, the input vectors and weight tiles are first dynamically scaled at runtime, to mitigate the quantization effects as follows: For an h × h weight tile $\mathcalW_t$, we denote each row vector as $\mathcalW_rt$ where the subscript r stands for the row and t for the tile. Similarly, an input vector of length h is denoted as $\mathcalX_t$ where t indicates the tile. Each weight row ${{{{\mathcalW}}}}_rt$ shares a single FP32 scale $s_rt^w=\max (| {{{{{{{{\mathcalW}}}}}}}}_rt| )$ and each input vector $\mathcalX_t$ shares a single FP32 scale $s_t^x=\max (| {{{{{{{\mathcalX}}}}}}}_t| )$. h scales per h × h weight tile and one scale per input vector, in total h + 1 scales, are stored for each tiled-MVM operation. The tiled MVM is performed between the scaled weight and input vectors, ${\widehat{{{{{{{{\mathcalW}}}}}}}}}_rt={{{{{{{{\mathcalW}}}}}}}}_rt/s_rt^w$ and ${\widehat{{{{{{{{\mathcalX}}}}}}}}}_t={{{{{{{{\mathcalX}}}}}}}}_t/s_t^x$, respectively, to produce $\widehatY_rt={\widehat{{{{{{{{\mathcalW}}}}}}}}}_rt{\widehat{{{{{{{{\mathcalX}}}}}}}}}_t$. The output $\widehatY_rt$ is then quantized (if required) to resemble the output ADCs and multiplied back with the appropriate scales so that the actual output elements $Y_rt=\widehatY_rt\cdot s_rt^w\cdot s_t^x$ are obtained.

Here, the methodology is the same for RNS-based and regular fixed-point cores. For the RNS-based case, in addition to the description above, the quantized input and weight integers are converted into the RNS space before the tiled-MVM operations. MVMs are performed separately for each set of residues and are followed by a modulo operation before the quantization step. The output residues for each tiled MVM are converted back to the standard representation using the CRT.

To accurately model the quantization during forward and backward passes, all GEMM operations (i.e., convolution, linear, and BMM layers) are sandwiched between an input operation O_in and an output operation O_out. This makes the operation order O_in-GEMM-O_out during the forward pass, and O_out-GEMM-O_in in the backward pass. O_in quantizes the input and weight tensors in the forward pass and is a null operation in the backward pass. In contrast, O_out is a null operation in the forward pass and quantizes the activation gradients in the backward pass. In this way, the quantization is always performed before the GEMM operation. The optimizer (i.e., SGD or Adam) is modified to keep a copy of the FP32 weights to use during the weight updates. Before each forward pass, the FP32 weights are copied and stored. After the forward pass, the quantized model weights are replaced by the previously stored FP32 weights before the step function so that the weight updates are performed in FP32. After the weight update, the model parameters are quantized again for the next forward pass. This high-precision weight update step is crucial for achieving high accuracy in training.

We trained ResNet-50 from scratch by using SGD optimizer for 90 epochs with a momentum of 0.9 and a learning rate starting from 0.1. The learning rate was scaled down by 10 at epochs 30, 60, and 80. We fine-tuned BERT-Large and OPT-125M from the implementations available in the Huggingface transformers repository⁴³. We used the Adam optimizer for both models with the default settings. The script uses a linear learning rate scheduler. The learning rate starts at 3e − 05 and 5e − 05 and the models are trained for 2 and 3 epochs, respectively for BERT-Large and OPT-125M.

Error distribution in the RRNS code space

For an RRNS(n + k, n) with n non-redundant moduli, i.e., $\\left.\right(m_1,m_2,…,m_n\$ and k redundant moduli, i.e., m_n+1, m_n+2, . . . , m_n+k, the probability distributions, i.e., p_c, p_d, and p_u, of different types of errors, i.e., Case 1, Case 2, and Case 3 that were mentioned in the RRNS for Fault Tolerance subsection are related to the Hamming distance distribution of the RRNS code space. In an RRNS(n + k, n), every integer is represented as n + k residues (r_i where i ∈ 1, . . . , n + k) and this vector of n + k residues is considered as an RRNS codeword. A Hamming distance of η ∈ 0, 1, . . . , n + k between the original codeword and the erroneous codeword indicates that η out of n + k residues are erroneous. The erroneous codewords create a new vector space of n + k-long vectors where at least one r_i is replaced with $r_i^\prime \ne r_i$ with i ∈ 1, . . . , n + k and $r_i^\prime < m_i$. This vector space includes all the RRNS(n + k, n) codewords as well as other possible n + k-long vectors that do not overlap with any codeword in the RRNS code space. A vector represents a codeword and is in the RRNS code space if and only if it can be converted into a value within the legitimate range $\left[0,M\right)$ of the RRNS(n + k, n) by using the CRT. The number of all vectors that have a Hamming distance η from a codeword in RRNS(n + k, n) can be expressed as

$$V_\eta =\mathop\sum_Q\left(\beginarraycn+k\\ \eta \endarray\right)\mathop\prod _i=1^\eta (m_i-1),$$

(12)

where $Q\left(\beginarraycn+k\\ \eta \endarray\right)$ represents one selection of η moduli from n + k moduli while $\mathop\sum _Q\left(\beginarraycn+k\\ \eta \endarray\right)$ represents the summation over all distinct $\left(\beginarraycn+k\\ \eta \endarray\right)$ selections. The number of codewords that are in the RNS code space with a Hamming distance of η ∈ 0, 1, . . . , n + k can be expressed as

$$D_\eta =\mathop\sum _h=0^\eta -1-k(-1)^h\left(\beginarraycn+k-\eta+h\\ n+k-\eta \endarray\right)\zeta (n+k,\eta -h),$$

(13)

for k + 1 ≤ η ≤ n + k. For 1 ≤ η ≤ k, D_η = 0 and D₀ = 1. ζ(n + k, η) represents the total number of non-zero common divisors in the legitimate range [0, M) for any n + k − η moduli out of the n + k moduli of the RRNS(n + k, n) code and can be denoted as

$$\zeta(n+k,\eta)=\mathop\sum _Q\left(\beginarraycn+k\\ n+k-\eta\endarray\right) \left\lfloor \fracM-1m_i_1m_i_2…m_i_(n+k-\eta)\right\rfloor,$$

(14)

where $(m_i_1,m_i_2,…,m_i_\lambda )$ with 1 ≤ λ ≤ n + k is a subset of the RRNS(n + k, n) moduli set.

An undetectable error occurs only if a codeword with errors overlaps with another codeword in the same RRNS space. Given the distance distributions for the vector space V and the codespace D (Eqs. (12), (13), respectively), the probability of observing an undetectable error (p_u) for RRNS(n + k, n) can be computed as

$$p_u=\mathop\sum _\eta=k+1^n+k\fracD_\eta V_\eta p_E(\eta ),$$

(15)

where p_E(η) is the probability of having η erroneous residues in a codeword which can be calculated as

$$p_E(\eta )=\mathop\sum_Q\left(\beginarraycn+k\\ \eta \endarray\right)p^\eta (1-p)^(n+k-\eta ),$$

(16)

for a given error probability in a single residue, p.

Eq. (13) indicates that for up to η = k erroneous residues D_η = 0, and so an erroneous codeword cannot overlap with another codeword in the RRNS code space. This guarantees the successful detection of the observed error. If the Hamming distance of the erroneous codeword is $\eta \le \lfloor \frack2\rfloor$, the error can be corrected by the majority logic decoding mechanism. In other words, the probability of observing a correctable error is equal to observing less or equal to $\lfloor \frack2\rfloor$ errors in the residues and can be calculated as

$$p_c=\mathop\sum _\eta=0^\lfloor \frack2\rfloor p_E(\eta )=\mathop\sum _\eta=0^\lfloor \frack2\rfloor \left(\mathop\sum_Q\left(\beginarraycn+k\\ \eta \endarray\right)p^\eta (1-p)^(n+k-\eta )\right).$$

(17)

All the errors that do not fall under the undetectable or correctable categories are referred to as detectable but not correctable errors with a probability p_d where p_d = 1 − (p_c + p_d). The equations in this section were collected from the work conducted by Yang²⁷.

To model the error in the RNS core for the analysis shown in Fig. 5, p_c, p_d, and p_u are computed for a given RRNS(n + k, n) and p value using Eqs. (15) and (17). Given the number of error correction attempts, p_err is calculated according to Eq. (8). Random noise is injected at the output of every tiled-MVM operation using a Bernoulli distribution with a probability of p_err.

Noise analysis

In analog hardware, both shot noise and thermal noise can be modeled as Gaussian distributions, i.e., $I_{{\rmshot}} \sim \sqrt2q_e\Delta fI_\rmout\mathcalN(0,1)$ where q_e is the elementary charge, Δf is the bandwidth, I_out is the output current of the analog dot product and $I_{{\rmthermal}} \sim \sqrt{\frac4k_B\Delta fT{R_{{\rmTIA}}}}{{{{{\mathcalN}}}}}(0,1)$ where k_B is the Boltzmann constant, T is the temperature, and R_TIA is the feedback resistor of the transimpedance circuitry.

For a modulus m, the consecutive output residues represented in the analog output current should be at least I_out/m apart from each other to differentiate m distinct levels. An error occurs in the output residue when $\sqrt{I_{{{{{{{{\rmshot}}}}}}}}^2+I_{{{{{{{{\rmthermal}}}}}}}}^2}\ge I_{{{{{\rmout}}}}}/2m$ as the residue will be rounded to the next integer otherwise. Therefore, the error probability in a single residue can be calculated as $p=P(\sqrt{2q_e\Delta fI_{{{{{{{{\rmout}}}}}}}}+\frac4k_B\Delta fT{R_{{{{{{{{\rmTIA}}}}}}}}}}{{{{{{{\mathcalN}}}}}}}(0,1)\ge I_{{{{{{{{\rmout}}}}}}}}/2m)$. We used Δf = 5 GHz, T = 300 K and R_TIA = 200Ω as typical values in the experiments shown in Fig. 5g–i. For a calculated p, p_err = 1 − (1−p)ⁿ for an n-moduli RNS (k = 0). For RRNS (k > 0), p_err can be obtained using Fig. 4 or Eq. (8).

RNS operations

The proposed analog RNS-based approach requires modular arithmetic, unlike conventional analog hardware. In this section, we discuss two ways of performing modular arithmetic in the analog domain in detail. We dive into one electrical solution using ring oscillators and one optical solution using phase shifters.

First, let us consider a ring oscillator with N inverters. In a ring oscillator, where each inverter has a propagation delay of t_prop > 0, there is always one inverter that has the same input and output—either 1 − 1 or 0 − 0—at any given time when the ring oscillator is on. The location of this inverter with the same input and output propagates in the oscillator, along with the signal, every t_prop time and rotates due to the ring structure. This rotation forms a modular behavior in the ring when the location of this inverter is tracked.

Let S_RO(t) be the state of a ring oscillator where S_RO(t) ∈ 0, . . . , N − 1 and S_RO(s) = s means that the s + 1-th inverter’s input and output have the same value at time t. S_RO(t) keeps rotating between 0 to N − 1 as long as the oscillator is on. Fig. 7a shows a simple example where N = 3. In the first t_prop time interval, the input and output of the first inverter are both 0, therefore, the state S_RO(t < t_prop) = 0. Similarly, when t_prop < t < 2t_prop, the input and output of the second inverter are 1, so S_RO(t_prop < t < 2t_prop) = 1. Here, the time between two states following one another (i.e., t_prop) is fixed and S_RO(t) rotates (0, 1, 2, 0, 1, . . . ). Assume the state of the ring oscillator is sampled periodically with a sampling period of T_s = A ⋅ t_prop. Then, the observed change in the state of the ring oscillator between two samples (S_RO(t = T_s) − S_RO(t = 0)) is equivalent to ∣A∣_N where A is a positive integer value. Therefore, to perform modulo with a modulus value m, the number of inverters N should be equal to m. The dividend number A and the sampling period can be adjusted by changing the analog input voltage to a voltage-to-time converter (VTC).

**Fig. 7: Analog modulo implementations.**

Here, the dot products can be performed using traditional methods with no change and with any desired analog technology where the output can be represented as an analog electrical signal (e.g., current or voltage) before the analog modulo. The ring oscillator is added to the hardware where the dividend A is the output of the dot product. Here, the total energy consumption of the analog modulo operation depends on A and the area footprint depends on m. The ring oscillator typically has a quite smaller energy consumption and area footprint than the other components in the system such as ADCs.

Second, let us consider a typical dual-rail phase shifter. The amount of phase shift introduced by the phase shifter when v and − v voltages are applied on the upper and the bottom arms, respectively, is

$$\Delta \Phi=\fracvL{V_{\pi \cdot {\rmcm}}},$$

(18)

where V_π⋅cm is the modulation efficiency of the phase shifter and is a constant value. ΔΦ is then proportional to both the length of the shifter L and the amount of applied voltage v. Figure 7b shows an example modular dot product operation between two vectors, x and w, using cascaded dual-rail phase shifters. This idea is similar to multi-operand MZIs⁴⁴ in which there are multiple active phase shifters controlled by independent signals on each modulation arm. Differently, here, w is encoded digit-by-digit using phase shifters with lengths proportional to 2^j where j represents the binary digit number. In the example, each element (i.e., w₀ and w₁) of the 2-element vector w consists of 3 digits and uses 3 phase shifters, each with lengths L, 2L, and 4L. If the j-th digit of the i-th element of w, $w_i^j=1$, a voltage v_i is applied to the phase shifter pair (top and bottom) with the length 2^jL. If the digit $w_i^j=0$, then no voltage is applied, and therefore, no phase shift is introduced to the input signal. To encode the second operand x, a voltage v_i that is proportional to x_i is applied to all non-zero digits of w_i. The multiplication result is then stored in the phase of the propagating signal through the phase shifters, which is modular with 2π. To perform modulo with an arbitrary modulus m instead of 2π, the applied voltage v should be multiplied by the constant 2π/m. For encoding an input integer x_i,

$$v_i=x_i\cdot \frac{V_{\pi \cdot {{{{{{{\rmcm}}}}}}}}}\pi L\cdot \frac2\pi m,$$

(19)

should be applied so that the total phase shift at the end of the optical path is

$$\Delta \Phi _{{{{{\rmtotal}}}}}=\left| \frac2 \pi m\mathop\sum_i\left(\sum_j(2^jw_i^j)x_i\right)\right|_2 \pi =\frac2 \pi m\left| \sum_i(w_ix_i)\right |_m.$$

(20)

The resulting output values in the optical phase are collected at the end of the optical path. These outputs are then re-multiplied by m/2π to obtain the outputs of the modular dot products for each modulus.

In the example in Fig. 7b, w is a digital number encoded digit-by-digit to control the phase shifters separately, while x is encoded via an analog voltage v. Ideally, the pre-trained w (for inference) can be programmed onto the photonic devices once and kept fixed for multiple inferences. However, today’s DNN with millions to billions of parameters makes it impossible to map a whole DNN onto a single accelerator. Therefore, although DNN parameters are not calculated during runtime, w has to be tiled into smaller pieces and loaded into the photonic devices tile by tile. Additionally, modern neural networks that use attention modules require multiplications between matrices that cannot be pre-computed. As a result, both x and w are stored as digital values in the memory before the operations. To this end, the order of these variables can be easily exchanged, i.e., x can be programmed digit-by-digit and w can be used as an analog value or vice versa.

In this approach, the total length of the phase shifter on each arm depends on m and the vector size h. Therefore, achieving a feasible design requires a careful selection of the moduli set and the devices used in the design. During an RNS multiplication with modulus m where both x and w are smaller than m, the maximum multiplication result is (m−1)² which can be mapped around zero as $[-\lfloor \frac(m-1)^22\rfloor,\lceil \frac(m-1)^22\rceil ]$. For a modular dot product unit with h elements, the range of the phase shift that the unit can introduce must be within $[-\Delta \Phi _\max ,\Delta \Phi _\max ]=[-\lceil \frac(m-1)^22\rceil \frac2\pi mh,\lceil \frac(m-1)^22\rceil \frac2\pi mh]$, when the maximum bias voltage $v_\max $ is applied. This requires a total phase shifter length that grows with O(mh) in the dot product unit.

Here, the unit phase shifter length L that creates $\frac2\pi m$ phase shift is determined by the V_π⋅cm of the phase shifter and the maximum bias voltage ($v_\max $). Essentially, a low V_π⋅cm and high $v_\max $ results in a short device length for the required phase shift. For high-speed phase shifters with modulation bandwidths ≥1 GHz, the most commonly used actuation mechanisms rely on plasma dispersion. For such phase shifters, prior work demonstrated V_π⋅cm values lower than 0.5 V ⋅ cm^{45,46,47,48,49} and optical losses less than 1 dB/cm^50,51.

To determine the total length in this RNS-based approach, the required RNS range (depending on the input precision and vector size) and the corresponding moduli choice are also critical. A moduli set with fewer but larger values requires fewer but longer dot product units, while a moduli set with more but smaller moduli results in many but shorter dot product units. To quantify, an example moduli set 5, 7, 8, 9, 11, 13 can achieve a dynamic range of more than 17 bits-which allows 6-bit arithmetic up to h = 90. When a phase shifter with 0.032 V ⋅ cm modulation efficiency at 2.8 V is used⁴⁸, the phase shifter length varies between 0.3–1.2 mm (per multiplier) for different moduli. With a typical device width of 25 μm, an array size of 64 × 64 (six arrays in total, one 64 × 64 array for each modulus in the abovementioned moduli set) can fit in a typical chip size of 500 mm². This approach is less area efficient and results in higher optical loss per MAC operation compared to a traditional MZI array due to the relatively long phase shifter lengths and utilization of multiple MVM arrays. However, this approach is feasible and it allows us to use lower-precision optical channels (2-to-4-bit for the example above), which can tolerate higher optical loss than even a typical 8-bit photonic hardware while achieving a much higher precision at the output (~17-bit). An equivalent precision requires 2¹⁷ differentiable analog levels at the output of the optical MAC operations and 17-bit ADCs, which is impractical in traditional photonic cores with today’s technology (See Fig. 6a).

The scalability of the RNS-based approach can further improve with the developments in photonics technology. Developing high-bandwidth phase shifters with low V_π⋅cm and low optical loss is still an active research area. Integration of new materials, e.g., (silicon-)germanium⁵², ferroelectrics^50,53, III-V semiconductors⁴⁹, 2D materials⁵⁴, and organic materials⁴⁸, provide promising results despite still being in very early stages. With these integration technologies maturing, more performant silicon photonics phase shifters can enable better area efficiency. In addition, using 3D integration to stack up photonic chiplets (e.g., photonic arrays for different moduli can be implemented on different layers) can further reduce the area footprint in such designs.

Extended RNS

By combining RNS and PNS, an integer value Z can be represented as D separate digits, z_d where d ∈ 0, 1, . . . , D − 1 and 0 ≤ z_d < M:

$$Z=\mathop\sum _d=0^D-1z_dM,$$

(21)

and can provide up to $D\log _2M$ bit precision. This hybrid scheme requires carry propagation from lower digits to higher digits, unlike the RNS-only scheme. For this purpose, one can use two sets of moduli, primary and secondary, where every operation is performed for both sets of residues. After every operation, overflow is detected for each digit and carried over to the next higher-order digit.

Let us define and pick n_p primary moduli m_i where i ∈ 1, . . . , n_p and n_s secondary moduli m_j where j ∈ 1, . . . , n_s, and m_i ≠ m_j ∀ i, j. Here $M=M_p\cdot M_s=\mathop\prod \nolimits_i=1^n_pm_i\cdot \mathop\prod \nolimits_j=1^n_sm_j$ is large enough to represent the largest possible output of the operations performed in this numeral representation and M_p and M_s are co-prime.

In this hybrid number system, operations for each digit are independent of one another and can be parallelized except for the overflow detection and carry propagation. Assume z_d = z_d∣_p;s consists of primary and secondary residues and is a calculated output digit of an operation before overflow detection. z_d can be decomposed as z_d∣_p = Q_d∣_pM_p + R_d∣_p where Q_d∣_p and R_d∣_p are the quotient and the remainder of the digit, with respect to the primary RNS. To detect a potential overflow in the digit z_d, a base extension from primary to secondary RNS is performed on z_d∣_p and the base extended residues are compared with the original secondary residues of the digit, z_d∣_s. If the residues are the same, this indicates that there is no overflow, i.e., Q_d∣_p;s = 0, and both primary and secondary residues are kept without any carry moved to the next higher digit. In contrast, if the base-extended secondary residues and the original secondary residues are not the same, there exists an overflow (i.e., Q_d∣_p;s ≠ 0). In the case of overflow, the remainder of the secondary RNS, R_d∣_s, is calculated through a base extension from primary to secondary RNS on R_d∣_p where R_d∣_p = z_d∣_p. Q_d∣_s can then be computed as $Q_d _s=(z_d _s-R_d_s)M_p^-1$ where $| M_p\cdot M_p^-1_M_s\equiv 1$. Q_d∣_p is calculated through base extension from the secondary to primary RNS on the computed Q_d∣_s. The full quotient Q_d∣_p;s is then propagated to the higher-order digit.

Algorithm 1 shows the pseudo-code for handling an operation □ using the extended RNS representation. The operation can be replaced by any operation that is closed under RNS. It should be noted that z_d∣_p;s cannot always be computed as x_d∣_p;s □ y_d∣_p;s. For operations such as addition, each digit before carry propagation is computed by simply adding the same digits of the operands, i.e., z_d∣_p;s = x_d∣_p;s + y_d∣_p;s. However, for multiplication, each digit of z_d∣_p;s should be constructed as in long multiplication. The multiplication of two numbers in the hybrid number system with D_x and D_y digits requires D_xD_y digit-wise multiplications and the output will result in D_z = D_x + D_y digits in total. Similarly, a dot product is a combination of multiply and add operations. If two vectors with h elements where each element has D_x and D_y digits, the output will require in $D_z=D_x+D_y+\log _2h$ digits.

Algorithm 1

Pseudocode for performing a □ operation using the hybrid number system. Here, x and y are the input operands of □. z_d represents the digits of the output where d ∈ 1, . . . , D_z, z_d∣_p are the primary residues, and z_d∣_s are the secondary residues. Primary and secondary residues together are referred to as $z_d^\prime _p;s$. Q is the quotient and R is the remainder where z_d = Q_dM_p + R_d p2s() and s2p() refer to base extension algorithms from primary to secondary residues and from secondary to primary residues, respectively.

Q₋₁∣_p;s = 0

for d in (0, D_z) do

$z_d^\prime _p;s=(x _p;s\square y _p;s)_\ d$

end for

for d in (0, D_z) do

$z_d_p;s=z_d^\prime _p;s+Q_d-1_p;s$

R_d∣_p = z_d∣_p

R_d∣_s = p2s(R_d∣_p)

if $R_d _s=z_d^\prime _s$ then

Q_d∣_p;s = 0

else

$Q_d_s=(z_d^\prime _s-R_d_s)M_p^-1$

Q_d∣_p = s2p(Q_d∣_s)

end if

end for

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link

Facebook baixar gratis