If you find an error in what I’ve made, then fork, fix lectures/l04_mac.md, commit, push and create a pull request. That way, we use the global brain power most efficiently, and avoid multiple humans spending time on discovering the same error.
Slides
Attention Is All You Need

Neural Nets 3blue1brown
\[a_{l+1} = \sigma(W_l a_l + b_l)\]
A NN consists of addition, multiplication, and a non-linear function

\[\mathbf{y} = \sigma\left(\begin{bmatrix}
w_{11} & w_{12} & \ldots & w_{1n} \\
w_{21} & w_{22} & \ldots & w_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
w_{m1} & w_{m2} & \ldots & w_{mn}
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{bmatrix} +
\begin{bmatrix}
b_1 \\
b_2 \\
\vdots \\
b_m
\end{bmatrix}\right)\]
\[{\mathrm{ OA}}_{(x,y,k)} = f \left ({\sum _{i=0}^{R-1} \sum _{j=0}^{S-1} \sum _{c=0}^{C-1} {\mathrm{ IA}}_{(x+i,y+j,c)} \times W_{(i,j,c,k)} }\right)\]
Assume N neurons
- N multiplications per neuron
- N + 1 additions per neuron
- 1 sigmoid per neuron
For efficient inference, additions and multiplications should be low power!

Kirchoff’s voltage law
The directed sum of the potential differences around any closed loop is zero
\(V_1 + V_2 + V_3 + V_4 = 0\)

Kirchoff’s current law
The algebraic sum of currents in a network of conductors meeting at a point is
zero
\(i_1 + i_2 + i_3 + i_4 = 0\)

Charge concervation
See Charge concervation on Wikipedia

\[Q_4 = Q_1 + Q_2 + Q_3\]
\[V_4 = \frac{ C_1 V_1 + C_2 V_2 + C_3 V_3}{C_1 + C_2 + C_3}\]
Multiplication
Digital capacitance
\[V_4 = \frac{ C_1 V_1 + C_2 V_2 + C_3 V_3}{C_1 + C_2 + C_3}\]
\[V_O = \frac{C_1}{C_{TOT}} V_1 + \dots + \frac{C_N}{C_{TOT}} V_N\]
Make capacitors digitally controlled, then
\[w_1 = \frac{C_1}{C_{TOT}}\]
Might have a slight problem with variable gain as a function of total capacitance
Mixing
\[I_{M1} = G_{m} V_{GS}\]
\[I_o = I_{M1} t_{input}\]

Translinear principle
MOSFET in sub-threshold

\[I = I_{D0} \frac{W}{L} e^{(V_{GS} - V_{th})/n U_{T}}\text{ ,} U_T = \frac{k T}{q}\]
\[I = \ell e^{V_{GS}/n U_{T}}\text{ , } \ell = I_{D0}\frac{W}{L} e^{-V_{th}/n U_{T}}\]
\[V_{GS} = n U_{T} \ln\left(\frac{I}{\ell}\right)\]

\[V_1 + V_2 = V_3 + V_4\]
\[n U_{T}\left[\ln\left(\frac{I_1}{\ell_1}\right) +
\ln\left(\frac{I_2}{\ell_2}\right)\right] = n
U_{T}\left[\ln\left(\frac{I_3}{\ell_3}\right) +
\ln\left(\frac{I_4}{\ell_4}\right)\right]\]
\[\ln\left(\frac{I_1 I_2}{\ell_1 \ell_2}\right) = \ln\left(\frac{I_3
I_4}{\ell_3 \ell_4}\right)\]
\[\frac{I_1 I_2}{\ell_1 \ell_2}= \frac{I_3
I_4}{\ell_3 \ell_4}\]
\[I_1 I_2 = I_3 I_4\text{ , if } \ell_1 \ell_2= \ell_3 \ell_4\]
\[I_1 I_2 = I_3 I_4\]
\[I_1 = I_a\text{ ,}I_2 = I_b + i_b\text{ ,}I_3 = I_b\text{ ,}I_4 = I_a + i_a\]
\[I_a (I_b + i_b) = I_b (I_a + i_a)\]
\[I_a I_b + I_a i_b = I_b I_a + I_b i_a\]
\[i_b = \frac{I_b}{I_a} i_a\]
\[\ell_1 \ell_2= \ell_3 \ell_4\]
\[\ell_1 = I_{D0}\frac{W}{L} e^{-V_{th}/n U_{T}}\]
\[\ell_2 = I_{D0}\frac{W}{L} e^{-(V_{th} \pm \sigma_{th})/n U_{T}} = \ell_1 e^{\pm \sigma_{th}/n U_{T}}\]
\[\sigma_{th} = \frac{a_{vt}}{\sqrt{W L}}\]
\[\frac{\ell_2}{\ell_1} = e^{\pm \frac{a_{vt}}{\sqrt{W L}}/n U_{T}}\]
Demo
JNW_SV_SKY130A
Want to learn more?
An Always-On 3.8 u J/86 % CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS
CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference
ARCHON: A 332.7TOPS/W 5b Variation-Tolerant Analog CNN Processor Featuring Analog Neuronal Computation Unit and Analog Memory
IMPACT: A 1-to-4b 813-TOPS/W 22-nm FD-SOI Compute-in-Memory CNN Accelerator Featuring a 4.2-POPS/W 146-TOPS/mm2 CIM-SRAM With Multi-Bit Analog Batch-Normalization