Saturday, April 26, 2014
Poisson Process
As discretization of Bernoulli Process
\(\quad\)Prob. of \(k\) arrivals in interval duration of \(\tau\). Note \(N_\tau \approx \) Binomial(\(n, p\)).
  |
\[
\begin{aligned}
\mathbb{P}(N_\tau = k) &= \mathbb{P}(k,\tau) & \text{where } \displaystyle\sum_{k=0}^\infty\mathbb{P}(k,\tau) = 1 \\
&= \frac{(np)^k}{k!} e^{-\displaystyle np} \\
&= \frac{(\displaystyle\color{blue}{\frac{\tau}{\delta}}\cdot\color{red}{\lambda\delta})^k}{k!} e^{-\displaystyle\color{blue}{\frac{\tau}{\delta}}(\color{red}{\lambda\delta)}}
& \text{where }n = \displaystyle\color{blue}{\frac{\tau}{\delta}}\text{ and }p = \color{red}{\lambda \delta} + O(\delta^2) \\
\mathbb{P}(k, \tau) &= \frac{(\lambda\tau)^k}{k!} e^{-\lambda\tau} & \text{which is the Poisson PMF with parameter } (\lambda, \tau)\\
\end{aligned}
\] |
\(\quad\)\(\mathbb{E}[N_\tau] \approx np \approx \lambda \tau \qquad var(N_\tau) \approx np(1-p) \approx \lambda\tau \qquad \)Note \(\mathbb{E}[N_\tau] = \displaystyle\sum_{k=0}^\infty k \frac{(\lambda\tau)^k e^{-\lambda\tau}}{k!}\)
Small interval probabilities
  |
\[
\begin{aligned}
\mathbb{P}(k,\delta) \approx \begin{cases}1 - \lambda\delta & k=0 \\ \color{red}{\lambda\delta} & k = 1 \\ 0 & k > 1 \end{cases} \\
\end{aligned}
\] |
by using very small \(\delta\) so that the probability of having more than one arrival becomes negligible. \(\lambda\), the arrival rate, can be perceived as the probability of arrival per unit time.
Time \(T_1\) until the \(1^{st}\) arrival
  |
\[
\begin{aligned}
\mathbb{P}(T_1 \le t) &= 1 - \mathbb{P}(T_1 > t) = 1 - \mathbb{P}(0, t) = 1 - e^{-\lambda t} \\
f_{T_1}(t) &= \frac{d}{dt} \mathbb{P}(T_1 \le t) = \lambda e^{-\lambda t} = \text{Exponential}(\lambda)
\end{aligned}
\] |
Memorylessness: \(f_{T_1}(t' - t\,|\, t' > t) =\) Exponential(\(\lambda\)).
Time \(Y_k\) of the \(k^{th}\) arrival
  |
\[
\begin{aligned}
\mathbb{P}(Y_k \le y) &= \sum_{n=k}^\infty \mathbb{P}(n, y) & \text{CDF argument} \\
f_{Y_k}(y) \delta &\approx \mathbb{P}(y \le Y_k \le y + \delta) = \mathbb{P}(k-1, y) \mathbb{P}(1, \delta) = \mathbb{P}(k-1, y) \cdot \color{red}{\lambda\delta} & \text{more intuitive argument}\\
\therefore f_{Y_k}(y) &\approx \mathbb{P}(k-1, y) \cdot \lambda = \frac{\lambda^k y^{k-1} e^{-\lambda y}}{(k-1)!} \qquad \text{for }y \ge 0 & \color{blue}{\text{Erlang distribution }} \text{of order } k \\
\end{aligned}
\] |
  |
\[
\begin{aligned}
Y_k &= T_1 + \cdots + T_k & \text{sum of i.i.d. exponentials} \\
\mathbb{E}[Y_k] &= \frac{k}{\lambda} \qquad var(Y_k) = \frac{k}{\lambda^2} \\
\end{aligned}
\] |
Sum of independent Poisson r.v's with means/parameters \(\mu\) and \(\nu\)
\(\quad\)Poisson(\(\mu\)) + Poisson(\(\nu\)) = Poisson(\(\mu + \nu) \qquad\) with mean/parameter \(\mu + \nu\)
In general, the sum of two independent Poisson random variables is also a Poisson random variable as if the two given r.v.'s are numbers of arrivals in disjoint time intervals. This is a special property similar to that of Normal r.v.'s, and most other distributions do not have.
Splitting a Poisson process
Splitting a Bernoulli process into two would result in two Bernoulli processes that are dependent, since the arrival in a given time slot of one process would imply the non-arrival of the other process in the same time slot. Similarly, splitting a Poisson process into two would also result in two Poisson processes. Surprisingly, however, they are independent, as time is continuous.
Source: MITx 6.041x, Lecture 22, 23.
Friday, April 25, 2014
Bernoulli Process
Time of the \(k^{th}\) success/arrival
  |
\[
\begin{aligned}
Y_k &= \text{ time of }k^{th} \text{ arrival} & \color{blue}{\text{Pascal random variable}} \\
&= T_1 + \cdots + T_k & T_i \text{ are i.i.d, Geometric}(p) \\
T_k &= k^{th}\text{ inter-arrival time } = Y_k - Y_{k-1} \\
\mathbb{E}[Y_k] &= \frac{k}{p} \qquad var(Y_k) = \frac{k(1-p)}{p^2} \\
\mathbb{P}(Y_k = t) &= \mathbb{P}(k-1 \text{ arrivals in }t-1 \text{ time}) \cdot \mathbb{P}(\text{arrival at time } t) \\
p_{Y_k}(t) &= {{t-1} \choose {k-1}} p^k (1-p)^{t-k} \qquad \text{for }t = k, k+1, \cdots & \color{blue}{\text{Pascal PMF}} \text{ of order }k \text{ and parameter }p \\
\end{aligned}
\] |
Merging of Bernoulli(\(p\)) and Bernoulli(\(q\))
\(\quad\) Bernoulli(\(p + q - pq\)) \(\qquad\qquad\) Collisions are counted as one arrival
  |
\[
\begin{aligned}
\mathbb{P}(\text{arrival in first process | arrival}) &= \frac{p}{p+q-pq} \\
\end{aligned}
\] |
Source: MITx 6.041x, Lecture 21.
Wednesday, April 23, 2014
Linear map
A function \(L: \mathbb{R}^n \to \mathbb{R}^m\) is called a linear map if it respects addition and scalar multiplication. Symbolically, for a map to be linear, we must have that
  |
\[
\begin{aligned}
L(\vec{v}+\vec{w}) &= L(\vec{v})+L(\vec{w}) & \text{for all }\vec{v},\vec{w} \in \mathbb{R}^n \\
\end{aligned}
\] |
and also
  |
\[
\begin{aligned}
L(a\vec{v}) &= a L(\vec{v}) \quad & \text{for all }a \in \mathbb{R} \text{ and } \vec{v} \in \mathbb{R}^n \\
\end{aligned}
\] |
To each linear map \(L: \mathbb{R}^n \to \mathbb{ℝ}^m\) we associate a \(m \times n\) matrix \(A_L\) called the matrix of the linear map with respect to the standard coordinates. It is defined by setting \(a_{i,j}\) to be the \(i^{th}\) component of \(L(e_j)\). In other words, the \(j^{th}\) column of the matrix \(A_L\) is the vector \(L(e_j)\).
Source: m2o2c2.
Saturday, April 19, 2014
AES performance
Test
# java version "1.7.0_51"
# Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
# Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
# CBC: AES/CBC/PKCS5Padding; SunJCE
# BC.CBC: AES/CBC/PKCS5Padding; BC (Bouncy Castle v1.5)
# GCM: AES/GCM/NoPadding; BC (Bouncy Castle v1.5)
# CTR: AES/CTR/NoPadding; SunJCE
# BC.CTR: AES/CTR/NoPadding; BC (Bouncy Castle v1.5)
# key length: 256 bits
# (CBC/CTR) IV length: 128 bits
# (GCM) tag length: 128 bits
# (GCM) IV length: 96 bits
# plaintext size: ~5M bytes
# number of encrypt/decrypt per Cipher: 160
# time unit: milliseconds
Statistics
vars n mean sd median trimmed mad min max range skew kurtosis se
CBC 1 160 62.65 29.32 59.0 59.42 5.93 51 419 368 11.25 133.50 2.32
BC.CBC 2 160 170.31 69.71 162.0 163.75 10.38 145 1031 886 11.76 142.23 5.51
GCM 3 160 324.79 74.30 316.5 316.79 12.60 297 1233 936 11.44 136.54 5.87
CTR 4 160 67.68 26.47 64.0 65.05 5.93 57 391 334 11.39 136.10 2.09
BC.CTR 5 160 166.16 68.23 158.0 160.00 13.34 139 1009 870 11.78 142.57 5.39
CBC BC.CBC GCM CTR BC.CTR
Min. : 51.00 Min. : 145.0 Min. : 297.0 Min. : 57.00 Min. : 139.0
1st Qu.: 55.00 1st Qu.: 156.0 1st Qu.: 308.8 1st Qu.: 60.00 1st Qu.: 151.0
Median : 59.00 Median : 162.0 Median : 316.5 Median : 64.00 Median : 158.0
Mean : 62.65 Mean : 170.3 Mean : 324.8 Mean : 67.68 Mean : 166.2
3rd Qu.: 63.00 3rd Qu.: 171.0 3rd Qu.: 325.0 3rd Qu.: 70.00 3rd Qu.: 169.2
Max. :419.00 Max. :1031.0 Max. :1233.0 Max. :391.00 Max. :1009.0
R
all <- read.table("/tmp/cipher_compare.log", header=T)
library(psych)
describe(all)
summary(all)
plot(density(all$CBC),col="black",main="Cipher latency density plots",xlab="Latency (ms)",ylab="Frequency density")
lines(density(all$BC.CBC),col="red")
lines(density(all$CTR),col="blue")
lines(density(all$BC.CTR),col="orange")
lines(density(all$GCM),col="green")
legend("topright", legend=c(
"AES/CBC/PKCS5Padding; SunJCE 1.7.0_51",
"AES/CBC/PKCS5Padding; BC 1.5",
"AES/CTR/NoPadding; SunJCE 1.7.0_51",
"AES/CTR/NoPadding; BC 1.5",
"AES/GCM/NoPadding; BC 1.5"),
text.col=c("black","red","blue","orange","green”))
Discussion at
Bouncy Castle devo forum.
Saturday, April 12, 2014
Inequalities
Markov inequality
\(\quad\) If \(X \ge 0\) and \(a > 0\), then
  |
\[
\begin{aligned}
\mathbb{P}(X \ge a) &\le \frac{\mathbb{E}[X]}{a} \\
\end{aligned}
\] |
Chebyshev inequality
  |
\[
\begin{aligned}
\mathbb{P}(|X-\mu| \ge c) &\le \frac{\sigma^2}{c^2} & c > 0 \\
\end{aligned}
\] |
Weak law of large numbers (WLLN)
\(\quad X_1, X_2, \cdots \) i.i.d; finite mean \(\mu\) and variance \(\sigma^2\)
  |
\[
\begin{aligned}
M_n &= \frac{X_1 + \cdots + X_n}{n} & \text{sample mean} \\
\mathbb{E}[M_n] &= \mu \\
var(M_n) &= \frac{\sigma^2}{n} \\
\mathbb{P}(|M_n - \mu| \ge \epsilon) &\le \frac{\sigma^2}{n\epsilon^2} \rightarrow 0 \quad \text{as } n \rightarrow \infty &\text{ for } \epsilon > 0 \\
\end{aligned}
\] |
Convergence in probability
\(\quad\) A sequence \(Y_n\), not necessarily independent, converges in probability to a number \(a\) if:
  |
\[
\begin{aligned}
\lim_{n->\infty}\mathbb{P}(|Y_n - a| \ge \epsilon) = 0 \\
\end{aligned}
\] |
\(\quad\) Caveat: Suppose \(X_n \rightarrow a\) in probability, \(\mathbb{E}[X_n]\) need not converge to \(a\) !
\(\quad\) How to check for convergence ? Guess \(a\), express the CDF as above, and then check if the limit approaches zero.
Central Limit Theorem
\(\quad X_1,\cdots,X_n\) i.i.d with finite mean \(\mu\) and variance \(\sigma^2\).
  |
\[
\begin{aligned}
S_n &= X_1 + \cdots + X_n & \text{variance: } n\sigma^2 \\
\frac{S_n}{\sqrt{n}} &= \frac{X_1 + \cdots + X_n}{\sqrt{n}} & \text{variance: } \sigma^2 \\
\color{red}{Z_n} &\color{red}{=} \color{red}{\frac{S_n - n\mu}{\sqrt{n}\sigma}}
& S_n \thicksim N(n\mu, n\sigma^2) \\
M_n &= \frac{S_n}{n} & \mathbb{E}[M_n] = \mu \quad var(M_n) = \frac{\sigma^2}{n} \\
\color{red}{Z_n} &\color{red}{=} \color{red}{\frac{M_n - \mu}{\sigma/\sqrt{n}}}
& M_n \thicksim N(\mu, \frac{\sigma^2}{n}) \\
\color{red}{\lim_{n \rightarrow \infty} \mathbb{P}(Z_n \le z)} &\color{red}{=} \color{red}{\mathbb{P}(Z \le z)} = \Phi(z) & \text{for every }z, \text{ where } Z \thicksim N(0,1) \\
\end{aligned}
\] |
Moivre-Laplace 1/2 correction
\(\quad\)When \(a\) is discrete.
  |
\[
\begin{aligned}
\mathbb{P}(S_n \le a) &\ge p & \text{under estimate} \\
\mathbb{P}(S_n \lt a+1) &\ge p & \text{over estimate} \\
\mathbb{P}(S_n \lt a+0.5) &\ge p & \text{1/2 correction} \\
\end{aligned}
\] |
Jensen's inequality
\(\quad\)If \(g\) is a convex function (ie a smiley face). Note \(g''(x) \ge 0\)
  |
\[
\begin{aligned}
\color{blue}{g(x)} &\color{blue}{\ge g(c) + g'(c) (x-c)} & \text{1st order Taylor expansion} \\
g(X) &\ge g(\mathbb{E}[X]) + g'(\mathbb{E}[X]) (x-\mathbb{E}[X]) & \text{let } c = \mathbb{E}[X] \\
\mathbb{E}[g(X)] &\ge \mathbb{E}[g(\mathbb{E}[X])] + \mathbb{E}\left[g'\big(\mathbb{E}[X]\big) \big(X-\mathbb{E}[X]\big)\right] & \text{note }\mathbb{E}[X] \text{ is a constant} \\
\mathbb{E}[g(X)] &\ge g(\mathbb{E}[X]) + g'(\mathbb{E}[X]) (\mathbb{E}[X]-\mathbb{E}[X]) \\
\color{blue}{\mathbb{E}[g(X)]} &\color{blue}{\ge g(\mathbb{E}[X])} \\
\end{aligned}
\] |
Hoeffding's inequality
\(\quad X_i\) are random variables with i.i.d, and each is equally likely to be either -1 or 1. Let \(a \gt 0\)
  |
\[
\begin{aligned}
\mathbb{P}(X_1 + \cdots + X_n \ge na) &\le e^{-na^2/2} & \text{a much stronger bound than Chebyshev inequality} \\
\end{aligned}
\] |
Source: MITx 6.041x, Lecture 18.
Friday, April 11, 2014
Probability density function
Joint pdf
  |
\[
\begin{aligned}
\int \int f_{X,Y}(x,y) \, dx \, dy &= 1 \\
F_{X,Y}(x,y) &= P(X \le x, Y \le y) = \int_{-\infty}^y \int_{-\infty}^x f_{X,Y}(s,t) \,ds\, dt & \text{cumulative density function (CDF) } \\
f_{X,Y}(x,y) &= \frac{d^2}{dx\, dy} F_{X,Y}(x,y) \\
\end{aligned}
\] |
From joint to marginal
  |
\[
\begin{aligned}
f_X(x) &= \int f_{X,Y}(x,y)\,dy \\
F_X(x) &= P(X \le x) = \int_{-\infty}^x \bigg( \int_{-\infty}^\infty f_{X,Y}(s,t) \, dt \bigg) ds \\
\end{aligned}
\] |
\(Y = aX + b\)
  |
\[
\begin{aligned}
p_Y(y) &= \mathbb{P}(Y=y) = \mathbb{P}(y=aX+b) = \mathbb{P}(X=\frac{y-b}{a}) = p_X(\frac{y-b}{a}) & \text{discrete r.v.} \\
F_Y(y) &= \mathbb{P}(Y \le y) = \mathbb{P}(aX+b \le y) = \mathbb{P}(X \le \frac{y-b}{a}) = F_X(\frac{y-b}{a}) \\
\color{orange}{f_Y(y)} &= \frac{d}{dx}F_X(\frac{y-b}{a}) = \color{orange}{\frac{1}{|a|}\cdot f_X(\frac{y-b}{a})} & \text{continuous r.v.} \\
\end{aligned}
\] |
\(\color{red}{Z = X + Y}\)
\(\quad(X,Y\) independent)
  |
\[
\begin{aligned}
\mathbb{P}_Z(z) &= \sum_x \mathbb{P}(X=x, Y=z-x) = p_Z(z) \\
p_Z(z) &= \sum_x p_X(x) \cdot p_Y(z - x) \\
\end{aligned}
\] |
Discrete convolution mechanics
Given \(z\) and \(Z=X+Y\), find \(p_Z(z)\) from \(p_X(x)\) and \(p_Y(y)\).
  |
\[
\begin{aligned}
\color{blue}{f_{Z|X}(z|x)} &= f_{Y+X|X}(z|x) = f_{Y+X}(z) & \text{by independence of } X,Y \\
&= \color{blue}{f_Y(z-x)} & \color{orange}{f_{X+b}(x) = f_X(x-b)}, \text{ see Lec 11}
\end{aligned}
\] |
Joint pdf of Z and X
  |
\[
\begin{aligned}
f_{X,Z}(x,z) &= f_X(x)\cdot \color{blue}{f_{Z|X}(z|x)} = f_X(x) \cdot \color{blue}{f_Y(z-x)} \\
f_{Z}(z) &= \int_{-\infty}^\infty f_{X,Z}(x,z)\,dx = \int_{-\infty}^\infty f_X(x) \cdot \color{blue}{f_Y(z-x)} \,dx \\
\end{aligned}
\] |
\(Y = g(X)\)
\(\quad\) How to find \(f_Y(y)\) in general ?
  |
\[
\begin{aligned}
F_Y(y) &= \mathbb{P}(g(X) \le y) \\
f_Y(y) &= \frac{d}{dy}F_Y(y) \\
\end{aligned}
\] |
\(Y = g(X)\) when \(g\) is monotonic
  |
\[
\begin{aligned}
F_Y(y) &= \mathbb{P}(g(X) \le y) = \mathbb{P}(X \le h(y)) = F_X(h(y)) \\
f_Y(y) &= \frac{d}{dy}F_Y(y) = \frac{d}{dy}F_X\left(h(y)\right) = f_X\left(h(y)\right) \big\lvert \frac{d}{dy} h(y) \big\rvert \\
\end{aligned}
\] |
Source: MITx 6.041x, Lecture 9, 11, 12.
Sunday, April 06, 2014
Euclidean quiz
Saturday, April 05, 2014
Estimator
A point estimate, \(\hat\theta = g(x) \), is a number, whereas an estimator, \(\widehat\Theta = g(X) \), is a random variable.
  |
\[
\begin{aligned}
\hat\theta_{MAP} &= g_{MAP}(x): \text{ maximises } p_{\Theta|X}(\theta\,|\,x) \\
\hat\theta_{LMS} &= g_{LMS}(x) = \mathbb{E}[\Theta \,|\, X=x] \\
\end{aligned}
\] |
Conditional probability of error
  |
\[
\begin{aligned}
\mathbb{P}(\hat\theta &\ne \Theta \,|\, X =x) & \text{smallest under the MAP rule} \\
\end{aligned}
\] |
Overall probability of error
  |
\[
\begin{aligned}
\mathbb{P}(\widehat\Theta \ne \Theta) &= \int \mathbb{P}(\widehat\Theta \ne \Theta\,|\,X=x)\cdot f_X(x)\, dx \\
&= \sum_\theta \mathbb{P}(\widehat\Theta \ne \Theta\,|\,\Theta=\theta)\cdot p_\Theta(\theta) \\
\end{aligned}
\] |
Mean squared error (MSE)
  |
\[
\begin{aligned}
&\mathbb{\mathbb{E}}\left[\big(\Theta - \hat\theta\big)^2\right] \\
\end{aligned}
\] |
Minimized when \(\hat\theta = \mathbb{\mathbb{E}}[\Theta], \text{ so that } \)
  |
\[
\begin{aligned}
&\mathbb{E}\left[\big(\Theta - \hat\theta\big)^2\right] = \mathbb{\mathbb{E}}\left[\big(\Theta - \mathbb{E}[\Theta]\big)^2\right] = \text{var}(\Theta) & \text{least mean square (} \color{blue}{\text{LMS}}) \\
\end{aligned}
\] |
Conditional mean squared error
  |
\[
\begin{aligned}
&\mathbb{E}\left[\big(\Theta - \hat\theta\big)^2 \,\big|\, \color{blue}{X=x} \right] & \text{with observation }x \\
\end{aligned}
\] |
Minimized when \(\hat\theta = \mathbb{E}[\Theta\,\big|\, \color{blue}{X=x}], \text{ so that } \)
  |
\[
\begin{aligned}
\mathbb{\mathbb{E}}\left[\big(\Theta - \hat\theta\big)^2\,\big|\, \color{blue}{X=x} \right] &= \mathbb{E}\left[\big(\Theta - \mathbb{E}[\Theta\,|\, \color{blue}{X=x}]\big)^2 \,\big|\, \color{blue}{X=x} \right] \\
&= \color{red}{\text{var}(\Theta\,|\, X=x)} \qquad \text{expected performance, given a measurement} \\
\end{aligned}
\] |
Expected performance of the design:
  |
\[
\begin{aligned}
\mathbb{E}\left[\big(\Theta - \mathbb{E}[\Theta\,|\, \color{blue}{X}]\big)^2 \right] &= \color{red}{\mathbb{E}\left[\text{var}(\Theta\,|\,\ X)\right]} \\
\end{aligned}
\] |
Note that \(\hat\theta\) is an estimate whereas \(\widehat\Theta = \mathbb{E}[\Theta\,|\,X]\) is an estimator.
Linear least mean square (LLMS) estimation
\(\quad\) Minimize \(\mathbb{E}\left[(\Theta - aX - b)^2\right]\) w.r.t. \(a, b\)
  |
\[
\begin{aligned}
\widehat\Theta_L &= \mathbb{E}[\Theta] + \frac{cov(\Theta,X)}{var(X)}\big(X - \mathbb{E}[X]\big)\\
&= \mathbb{E}[\Theta] + \rho\frac{\sigma_\Theta}{\sigma_X}\big(X - \mathbb{E}[X]\big) & \text{only means, variances and covariances matter} \\
\end{aligned}
\] |
\(\quad\)Error variance:
  |
\[
\begin{aligned}
\mathbb{E}[(\widehat\Theta_L - \Theta)^2] &= (1-\rho^2) \cdot var(\Theta) \\
\end{aligned}
\] |
Source: MITx 6.041x, Lecture 16, 17.
Friday, April 04, 2014
Variance
  |
\[
\begin{aligned}
var(X) &= \mathbb{E}\left[(X - \mathbb{E}[X])^2\right] \\
&= \mathbb{E}\left[X^2\right] - \big(\mathbb{E}[X]\big)^2 \\
var(X\,|\,Y=y) &= \mathbb{E}\left[\left(X - \mathbb{E}[X\,|\,Y=y]\right)^2\,|\,Y=y\right] \\
\end{aligned}
\] |
Law of total variance
  |
\[
\begin{aligned}
var(X) &= \mathbb{E}\left[var(X\,|\,Y)\right] + var\big(\mathbb{E}[X\,|\,Y]\big) \\
var(X_1 + \cdots + X_N) &= \mathbb{E}\left[var(X_1 + \cdots + X_N\,|\,N)\right] + var\big(\mathbb{E}[X_1 + \cdots + X_N \,|\,N]\big) \\
&= \mathbb{E}[N] \cdot var(X) + \big(\mathbb{E}[X]\big)^2\cdot var(N) \\
\end{aligned}
\] |
Covaraince
  |
\[
\begin{aligned}
cov(X,Y) &= \mathbb{E}\left[(X - \mathbb{E}[X])\cdot(Y - \mathbb{E}[Y])\right] \\
&= \mathbb{E}\left[XY\right] - \mathbb{E}[X]\cdot \mathbb{E}[Y] \\
cov(aX+b,Y) &= a\cdot cov(X,Y) \\
cov(X,Y+Z) &= cov(X,Y) + cov(X,Z) \\
\end{aligned}
\] |
Correlation coefficient
  |
\[
\begin{aligned}
\rho(X,Y) &= \frac{cov(X,Y)}{\sigma_X \sigma_Y} & -1 \le \rho \le 1\\
\end{aligned}
\] |
Sum of random variables
  |
\[
\begin{aligned}
var(X_1 + X_2) &= var(X_1) + var(X_2) + 2cov(X_1,X_2) \\
var(X_1 - X_2) &= var(X_1) + var(X_2) - 2cov(X_1,X_2) \\
\end{aligned}
\] |
Source: MITx 6.041x, Lecture 12, 13.
Expectation
  |
\[
\begin{aligned}
\mathbb{E}[X] &= \sum_x x\cdot p_X(x) \qquad\text{or} \quad \int x\cdot f_X(x)\,dx \\
\mathbb{E}[X\,|\,A] &= \sum_x x\cdot p_{X|A}(x) \qquad\text{or} \quad \int x\cdot f_{X|A}(x)\,dx \\
g(y) &= \mathbb{E}[X\,|\,Y=y] = \sum_x x\cdot p_{X|Y}(x\,|\,y) \qquad\text{or} \quad \int x\cdot f_{X|Y}(x\,|\,y)\,dx \\
g(Y) &= \mathbb{E}[X\,|\,Y] & \text{conditional expectation as r.v.} \\
\end{aligned}
\] |
Expected value rule
  |
\[
\begin{aligned}
\mathbb{E}[g(X)] &= \sum_x g(x)\cdot p_X(x) \qquad\text{or} \quad \int g(x)\cdot f_X(x)\,dx \\
\mathbb{E}[g(X)\,|\,A] &= \sum_x g(x)\cdot p_{X|A}(x) \qquad\text{or} \quad \int g(x)\cdot f_{X|A}(x)\,dx \\
\mathbb{E}[g(X,Y)] &= \sum_x\sum_y g(x,y)\cdot p_{X,Y}(x,y) \qquad\text{or} \quad \int\int g(x,y)\cdot f_{X,Y}(x,y)\,dx\,dy \\
\end{aligned}
\] |
Total probability and expectation theorems
  |
\[
\begin{aligned}
\mathbb{P}(B) &= \color{blue}{\mathbb{P}(A_1)}\cdot \mathbb{P}(B\,|\,A_1) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot \mathbb{P}(B\,|\,A_n) \\
p_X(x) &= \color{blue}{\mathbb{P}(A_1)}\cdot p_{X|A_1}(x) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot p_{X|A_n}(x) \\
\\
F_X(x) &= \mathbb{P}(X \le x) \\
&= \color{blue}{\mathbb{P}(A_1)}\cdot \mathbb{P}(X \le\,|\,A_1) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot \mathbb{P}(X \le x\,|\,A_n) \\
&= \color{blue}{\mathbb{P}(A_1)}\cdot F_{X|A_1}(x) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot F_{X|A_n}(x) \\
\\
f_X(x) &= \color{blue}{\mathbb{P}(A_1)}\cdot f_{X|A_1}(x) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot f_{X|A_n}(x) \\
\int x\cdot f_X(x)\,dx &= \color{blue}{\mathbb{P}(A_1)}\int x\cdot f_{X|A_1}(x)\,dx + \cdots + \color{blue}{\mathbb{P}(A_n)}\int x\cdot f_{X|A_n}(x)\,dx \\
\mathbb{E}[X] &= \color{blue}{\mathbb{P}(A_1)}\cdot \mathbb{E}[X\,|\,A_1] + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot \mathbb{E}[X\,|\,A_n] \\
\mathbb{E}[X_1 + \cdots + X_N] &= \sum_n p_N(n)\cdot \mathbb{E}\big[X_1 + \cdots + X_N \,\big|\, N=n \big] \\
& = \color{red}{\sum_n p_N(n)\cdot n \cdot \mathbb{E}[X]} = \mathbb{E}[N]\cdot\mathbb{E}[X] \\
\end{aligned}
\] |
Linearity of expectations
  |
\[
\begin{aligned}
\mathbb{E}[aX+b] &= a\mathbb{E}[X] + b \\
\mathbb{E}[X + Y] &= \mathbb{E}[X] + \mathbb{E}[Y] \\
\end{aligned}
\] |
Law of iterated expectations
  |
\[
\begin{aligned}
\mathbb{E}\left[\mathbb{E}[X\,|\,Y]\right] &= \sum_y \mathbb{E}[X\,|\,Y=y] \cdot p_Y(y) = \mathbb{E}[X] \\
\mathbb{E}\left[\mathbb{E}[X_1 + \cdots + X_N\,|\,N]\right] &= \color{red}{\mathbb{E}\big[N\cdot\mathbb{E}[X]\big]} = \mathbb{E}[N]\cdot\mathbb{E}[X] \\
\end{aligned}
\] |
Source: MITx 6.041x, Lecture 9, 13.