Google
 
Web unafbapune.blogspot.com

Saturday, April 26, 2014

 

Poisson Process

As discretization of Bernoulli Process

\(\quad\)Prob. of \(k\) arrivals in interval duration of \(\tau\). Note \(N_\tau \approx \) Binomial(\(n, p\)).
  \[ \begin{aligned} \mathbb{P}(N_\tau = k) &= \mathbb{P}(k,\tau) & \text{where } \displaystyle\sum_{k=0}^\infty\mathbb{P}(k,\tau) = 1 \\ &= \frac{(np)^k}{k!} e^{-\displaystyle np} \\ &= \frac{(\displaystyle\color{blue}{\frac{\tau}{\delta}}\cdot\color{red}{\lambda\delta})^k}{k!} e^{-\displaystyle\color{blue}{\frac{\tau}{\delta}}(\color{red}{\lambda\delta)}} & \text{where }n = \displaystyle\color{blue}{\frac{\tau}{\delta}}\text{ and }p = \color{red}{\lambda \delta} + O(\delta^2) \\ \mathbb{P}(k, \tau) &= \frac{(\lambda\tau)^k}{k!} e^{-\lambda\tau} & \text{which is the Poisson PMF with parameter } (\lambda, \tau)\\ \end{aligned} \]
\(\quad\)\(\mathbb{E}[N_\tau] \approx np \approx \lambda \tau \qquad var(N_\tau) \approx np(1-p) \approx \lambda\tau \qquad \)Note \(\mathbb{E}[N_\tau] = \displaystyle\sum_{k=0}^\infty k \frac{(\lambda\tau)^k e^{-\lambda\tau}}{k!}\)

Small interval probabilities

  \[ \begin{aligned} \mathbb{P}(k,\delta) \approx \begin{cases}1 - \lambda\delta & k=0 \\ \color{red}{\lambda\delta} & k = 1 \\ 0 & k > 1 \end{cases} \\ \end{aligned} \]
by using very small \(\delta\) so that the probability of having more than one arrival becomes negligible. \(\lambda\), the arrival rate, can be perceived as the probability of arrival per unit time.

Time \(T_1\) until the \(1^{st}\) arrival

  \[ \begin{aligned} \mathbb{P}(T_1 \le t) &= 1 - \mathbb{P}(T_1 > t) = 1 - \mathbb{P}(0, t) = 1 - e^{-\lambda t} \\ f_{T_1}(t) &= \frac{d}{dt} \mathbb{P}(T_1 \le t) = \lambda e^{-\lambda t} = \text{Exponential}(\lambda) \end{aligned} \]

Memorylessness: \(f_{T_1}(t' - t\,|\, t' > t) =\) Exponential(\(\lambda\)).

Time \(Y_k\) of the \(k^{th}\) arrival

  \[ \begin{aligned} \mathbb{P}(Y_k \le y) &= \sum_{n=k}^\infty \mathbb{P}(n, y) & \text{CDF argument} \\ f_{Y_k}(y) \delta &\approx \mathbb{P}(y \le Y_k \le y + \delta) = \mathbb{P}(k-1, y) \mathbb{P}(1, \delta) = \mathbb{P}(k-1, y) \cdot \color{red}{\lambda\delta} & \text{more intuitive argument}\\ \therefore f_{Y_k}(y) &\approx \mathbb{P}(k-1, y) \cdot \lambda = \frac{\lambda^k y^{k-1} e^{-\lambda y}}{(k-1)!} \qquad \text{for }y \ge 0 & \color{blue}{\text{Erlang distribution }} \text{of order } k \\ \end{aligned} \]

  \[ \begin{aligned} Y_k &= T_1 + \cdots + T_k & \text{sum of i.i.d. exponentials} \\ \mathbb{E}[Y_k] &= \frac{k}{\lambda} \qquad var(Y_k) = \frac{k}{\lambda^2} \\ \end{aligned} \]

Sum of independent Poisson r.v's with means/parameters \(\mu\) and \(\nu\)

\(\quad\)Poisson(\(\mu\)) + Poisson(\(\nu\)) = Poisson(\(\mu + \nu) \qquad\) with mean/parameter \(\mu + \nu\)

In general, the sum of two independent Poisson random variables is also a Poisson random variable as if the two given r.v.'s are numbers of arrivals in disjoint time intervals. This is a special property similar to that of Normal r.v.'s, and most other distributions do not have.

Splitting a Poisson process

Splitting a Bernoulli process into two would result in two Bernoulli processes that are dependent, since the arrival in a given time slot of one process would imply the non-arrival of the other process in the same time slot. Similarly, splitting a Poisson process into two would also result in two Poisson processes. Surprisingly, however, they are independent, as time is continuous.

Source: MITx 6.041x, Lecture 22, 23.


Friday, April 25, 2014

 

Bernoulli Process

Time of the \(k^{th}\) success/arrival

  \[ \begin{aligned} Y_k &= \text{ time of }k^{th} \text{ arrival} & \color{blue}{\text{Pascal random variable}} \\ &= T_1 + \cdots + T_k & T_i \text{ are i.i.d, Geometric}(p) \\ T_k &= k^{th}\text{ inter-arrival time } = Y_k - Y_{k-1} \\ \mathbb{E}[Y_k] &= \frac{k}{p} \qquad var(Y_k) = \frac{k(1-p)}{p^2} \\ \mathbb{P}(Y_k = t) &= \mathbb{P}(k-1 \text{ arrivals in }t-1 \text{ time}) \cdot \mathbb{P}(\text{arrival at time } t) \\ p_{Y_k}(t) &= {{t-1} \choose {k-1}} p^k (1-p)^{t-k} \qquad \text{for }t = k, k+1, \cdots & \color{blue}{\text{Pascal PMF}} \text{ of order }k \text{ and parameter }p \\ \end{aligned} \]

Merging of Bernoulli(\(p\)) and Bernoulli(\(q\))

\(\quad\) Bernoulli(\(p + q - pq\)) \(\qquad\qquad\) Collisions are counted as one arrival
  \[ \begin{aligned} \mathbb{P}(\text{arrival in first process | arrival}) &= \frac{p}{p+q-pq} \\ \end{aligned} \]

Source: MITx 6.041x, Lecture 21.


Wednesday, April 23, 2014

 

Linear map

A function \(L: \mathbb{R}^n \to \mathbb{R}^m\) is called a linear map if it respects addition and scalar multiplication. Symbolically, for a map to be linear, we must have that
  \[ \begin{aligned} L(\vec{v}+\vec{w}) &= L(\vec{v})+L(\vec{w}) & \text{for all }\vec{v},\vec{w} \in \mathbb{R}^n \\ \end{aligned} \]
and also
  \[ \begin{aligned} L(a\vec{v}) &= a L(\vec{v}) \quad & \text{for all }a \in \mathbb{R} \text{ and } \vec{v} \in \mathbb{R}^n \\ \end{aligned} \]

To each linear map \(L: \mathbb{R}^n \to \mathbb{ℝ}^m\) we associate a \(m \times n\) matrix \(A_L\) called the matrix of the linear map with respect to the standard coordinates. It is defined by setting \(a_{i,j}\) to be the \(i^{th}\) component of \(L(e_j)\). In other words, the \(j^{th}\) column of the matrix \(A_L\) is the vector \(L(e_j)\).

Source: m2o2c2.


Saturday, April 19, 2014

 

AES performance

Test

# java version "1.7.0_51"
# Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
# Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

# CBC:    AES/CBC/PKCS5Padding; SunJCE
# BC.CBC: AES/CBC/PKCS5Padding; BC (Bouncy Castle v1.5)
# GCM:    AES/GCM/NoPadding;    BC (Bouncy Castle v1.5)
# CTR:    AES/CTR/NoPadding;    SunJCE
# BC.CTR: AES/CTR/NoPadding;    BC (Bouncy Castle v1.5)

# key length:          256 bits
# (CBC/CTR) IV length: 128 bits
# (GCM) tag length:    128 bits
# (GCM) IV length:     96 bits

# plaintext size: ~5M bytes
# number of encrypt/decrypt per Cipher: 160
# time unit: milliseconds

Statistics

       vars   n   mean    sd median trimmed   mad min  max range  skew kurtosis   se
CBC       1 160  62.65 29.32   59.0   59.42  5.93  51  419   368 11.25   133.50 2.32
BC.CBC    2 160 170.31 69.71  162.0  163.75 10.38 145 1031   886 11.76   142.23 5.51
GCM       3 160 324.79 74.30  316.5  316.79 12.60 297 1233   936 11.44   136.54 5.87
CTR       4 160  67.68 26.47   64.0   65.05  5.93  57  391   334 11.39   136.10 2.09
BC.CTR    5 160 166.16 68.23  158.0  160.00 13.34 139 1009   870 11.78   142.57 5.39


      CBC             BC.CBC            GCM              CTR             BC.CTR      
 Min.   : 51.00   Min.   : 145.0   Min.   : 297.0   Min.   : 57.00   Min.   : 139.0  
 1st Qu.: 55.00   1st Qu.: 156.0   1st Qu.: 308.8   1st Qu.: 60.00   1st Qu.: 151.0  
 Median : 59.00   Median : 162.0   Median : 316.5   Median : 64.00   Median : 158.0  
 Mean   : 62.65   Mean   : 170.3   Mean   : 324.8   Mean   : 67.68   Mean   : 166.2  
 3rd Qu.: 63.00   3rd Qu.: 171.0   3rd Qu.: 325.0   3rd Qu.: 70.00   3rd Qu.: 169.2  
 Max.   :419.00   Max.   :1031.0   Max.   :1233.0   Max.   :391.00   Max.   :1009.0  

R

all <- read.table("/tmp/cipher_compare.log", header=T)
library(psych)
describe(all)
summary(all)

plot(density(all$CBC),col="black",main="Cipher latency density plots",xlab="Latency (ms)",ylab="Frequency density")
lines(density(all$BC.CBC),col="red")
lines(density(all$CTR),col="blue")
lines(density(all$BC.CTR),col="orange")
lines(density(all$GCM),col="green")
legend("topright", legend=c(
  "AES/CBC/PKCS5Padding; SunJCE 1.7.0_51", 
  "AES/CBC/PKCS5Padding; BC 1.5", 
  "AES/CTR/NoPadding; SunJCE 1.7.0_51", 
  "AES/CTR/NoPadding; BC 1.5",
  "AES/GCM/NoPadding; BC 1.5"),
  text.col=c("black","red","blue","orange","green”))
Discussion at Bouncy Castle devo forum.

Saturday, April 12, 2014

 

Inequalities

Markov inequality

\(\quad\) If \(X \ge 0\) and \(a > 0\), then
  \[ \begin{aligned} \mathbb{P}(X \ge a) &\le \frac{\mathbb{E}[X]}{a} \\ \end{aligned} \]

Chebyshev inequality

  \[ \begin{aligned} \mathbb{P}(|X-\mu| \ge c) &\le \frac{\sigma^2}{c^2} & c > 0 \\ \end{aligned} \]

Weak law of large numbers (WLLN)

\(\quad X_1, X_2, \cdots \) i.i.d; finite mean \(\mu\) and variance \(\sigma^2\)
  \[ \begin{aligned} M_n &= \frac{X_1 + \cdots + X_n}{n} & \text{sample mean} \\ \mathbb{E}[M_n] &= \mu \\ var(M_n) &= \frac{\sigma^2}{n} \\ \mathbb{P}(|M_n - \mu| \ge \epsilon) &\le \frac{\sigma^2}{n\epsilon^2} \rightarrow 0 \quad \text{as } n \rightarrow \infty &\text{ for } \epsilon > 0 \\ \end{aligned} \]

Convergence in probability

\(\quad\) A sequence \(Y_n\), not necessarily independent, converges in probability to a number \(a\) if:
  \[ \begin{aligned} \lim_{n->\infty}\mathbb{P}(|Y_n - a| \ge \epsilon) = 0 \\ \end{aligned} \]
\(\quad\) Caveat: Suppose \(X_n \rightarrow a\) in probability, \(\mathbb{E}[X_n]\) need not converge to \(a\) !
\(\quad\) How to check for convergence ? Guess \(a\), express the CDF as above, and then check if the limit approaches zero.

Central Limit Theorem

\(\quad X_1,\cdots,X_n\) i.i.d with finite mean \(\mu\) and variance \(\sigma^2\).
  \[ \begin{aligned} S_n &= X_1 + \cdots + X_n & \text{variance: } n\sigma^2 \\ \frac{S_n}{\sqrt{n}} &= \frac{X_1 + \cdots + X_n}{\sqrt{n}} & \text{variance: } \sigma^2 \\ \color{red}{Z_n} &\color{red}{=} \color{red}{\frac{S_n - n\mu}{\sqrt{n}\sigma}} & S_n \thicksim N(n\mu, n\sigma^2) \\ M_n &= \frac{S_n}{n} & \mathbb{E}[M_n] = \mu \quad var(M_n) = \frac{\sigma^2}{n} \\ \color{red}{Z_n} &\color{red}{=} \color{red}{\frac{M_n - \mu}{\sigma/\sqrt{n}}} & M_n \thicksim N(\mu, \frac{\sigma^2}{n}) \\ \color{red}{\lim_{n \rightarrow \infty} \mathbb{P}(Z_n \le z)} &\color{red}{=} \color{red}{\mathbb{P}(Z \le z)} = \Phi(z) & \text{for every }z, \text{ where } Z \thicksim N(0,1) \\ \end{aligned} \]

Moivre-Laplace 1/2 correction

\(\quad\)When \(a\) is discrete.
  \[ \begin{aligned} \mathbb{P}(S_n \le a) &\ge p & \text{under estimate} \\ \mathbb{P}(S_n \lt a+1) &\ge p & \text{over estimate} \\ \mathbb{P}(S_n \lt a+0.5) &\ge p & \text{1/2 correction} \\ \end{aligned} \]

Jensen's inequality

\(\quad\)If \(g\) is a convex function (ie a smiley face). Note \(g''(x) \ge 0\)
  \[ \begin{aligned} \color{blue}{g(x)} &\color{blue}{\ge g(c) + g'(c) (x-c)} & \text{1st order Taylor expansion} \\ g(X) &\ge g(\mathbb{E}[X]) + g'(\mathbb{E}[X]) (x-\mathbb{E}[X]) & \text{let } c = \mathbb{E}[X] \\ \mathbb{E}[g(X)] &\ge \mathbb{E}[g(\mathbb{E}[X])] + \mathbb{E}\left[g'\big(\mathbb{E}[X]\big) \big(X-\mathbb{E}[X]\big)\right] & \text{note }\mathbb{E}[X] \text{ is a constant} \\ \mathbb{E}[g(X)] &\ge g(\mathbb{E}[X]) + g'(\mathbb{E}[X]) (\mathbb{E}[X]-\mathbb{E}[X]) \\ \color{blue}{\mathbb{E}[g(X)]} &\color{blue}{\ge g(\mathbb{E}[X])} \\ \end{aligned} \]

Hoeffding's inequality

\(\quad X_i\) are random variables with i.i.d, and each is equally likely to be either -1 or 1. Let \(a \gt 0\)
  \[ \begin{aligned} \mathbb{P}(X_1 + \cdots + X_n \ge na) &\le e^{-na^2/2} & \text{a much stronger bound than Chebyshev inequality} \\ \end{aligned} \]

Source: MITx 6.041x, Lecture 18.


Friday, April 11, 2014

 

Probability density function

Joint pdf

  \[ \begin{aligned} \int \int f_{X,Y}(x,y) \, dx \, dy &= 1 \\ F_{X,Y}(x,y) &= P(X \le x, Y \le y) = \int_{-\infty}^y \int_{-\infty}^x f_{X,Y}(s,t) \,ds\, dt & \text{cumulative density function (CDF) } \\ f_{X,Y}(x,y) &= \frac{d^2}{dx\, dy} F_{X,Y}(x,y) \\ \end{aligned} \]

From joint to marginal

  \[ \begin{aligned} f_X(x) &= \int f_{X,Y}(x,y)\,dy \\ F_X(x) &= P(X \le x) = \int_{-\infty}^x \bigg( \int_{-\infty}^\infty f_{X,Y}(s,t) \, dt \bigg) ds \\ \end{aligned} \]

\(Y = aX + b\)

  \[ \begin{aligned} p_Y(y) &= \mathbb{P}(Y=y) = \mathbb{P}(y=aX+b) = \mathbb{P}(X=\frac{y-b}{a}) = p_X(\frac{y-b}{a}) & \text{discrete r.v.} \\ F_Y(y) &= \mathbb{P}(Y \le y) = \mathbb{P}(aX+b \le y) = \mathbb{P}(X \le \frac{y-b}{a}) = F_X(\frac{y-b}{a}) \\ \color{orange}{f_Y(y)} &= \frac{d}{dx}F_X(\frac{y-b}{a}) = \color{orange}{\frac{1}{|a|}\cdot f_X(\frac{y-b}{a})} & \text{continuous r.v.} \\ \end{aligned} \]

\(\color{red}{Z = X + Y}\)

\(\quad(X,Y\) independent)
  \[ \begin{aligned} \mathbb{P}_Z(z) &= \sum_x \mathbb{P}(X=x, Y=z-x) = p_Z(z) \\ p_Z(z) &= \sum_x p_X(x) \cdot p_Y(z - x) \\ \end{aligned} \]

Discrete convolution mechanics

Given \(z\) and \(Z=X+Y\), find \(p_Z(z)\) from \(p_X(x)\) and \(p_Y(y)\).
  \[ \begin{aligned} \color{blue}{f_{Z|X}(z|x)} &= f_{Y+X|X}(z|x) = f_{Y+X}(z) & \text{by independence of } X,Y \\ &= \color{blue}{f_Y(z-x)} & \color{orange}{f_{X+b}(x) = f_X(x-b)}, \text{ see Lec 11} \end{aligned} \]

Joint pdf of Z and X

  \[ \begin{aligned} f_{X,Z}(x,z) &= f_X(x)\cdot \color{blue}{f_{Z|X}(z|x)} = f_X(x) \cdot \color{blue}{f_Y(z-x)} \\ f_{Z}(z) &= \int_{-\infty}^\infty f_{X,Z}(x,z)\,dx = \int_{-\infty}^\infty f_X(x) \cdot \color{blue}{f_Y(z-x)} \,dx \\ \end{aligned} \]

\(Y = g(X)\)

\(\quad\) How to find \(f_Y(y)\) in general ?
  \[ \begin{aligned} F_Y(y) &= \mathbb{P}(g(X) \le y) \\ f_Y(y) &= \frac{d}{dy}F_Y(y) \\ \end{aligned} \]

\(Y = g(X)\) when \(g\) is monotonic

  \[ \begin{aligned} F_Y(y) &= \mathbb{P}(g(X) \le y) = \mathbb{P}(X \le h(y)) = F_X(h(y)) \\ f_Y(y) &= \frac{d}{dy}F_Y(y) = \frac{d}{dy}F_X\left(h(y)\right) = f_X\left(h(y)\right) \big\lvert \frac{d}{dy} h(y) \big\rvert \\ \end{aligned} \]

Source: MITx 6.041x, Lecture 9, 11, 12.


Sunday, April 06, 2014

 

Euclidean quiz


Saturday, April 05, 2014

 

Estimator

A point estimate, \(\hat\theta = g(x) \), is a number, whereas an estimator, \(\widehat\Theta = g(X) \), is a random variable.
  \[ \begin{aligned} \hat\theta_{MAP} &= g_{MAP}(x): \text{ maximises } p_{\Theta|X}(\theta\,|\,x) \\ \hat\theta_{LMS} &= g_{LMS}(x) = \mathbb{E}[\Theta \,|\, X=x] \\ \end{aligned} \]

Conditional probability of error

  \[ \begin{aligned} \mathbb{P}(\hat\theta &\ne \Theta \,|\, X =x) & \text{smallest under the MAP rule} \\ \end{aligned} \]

Overall probability of error

  \[ \begin{aligned} \mathbb{P}(\widehat\Theta \ne \Theta) &= \int \mathbb{P}(\widehat\Theta \ne \Theta\,|\,X=x)\cdot f_X(x)\, dx \\ &= \sum_\theta \mathbb{P}(\widehat\Theta \ne \Theta\,|\,\Theta=\theta)\cdot p_\Theta(\theta) \\ \end{aligned} \]

Mean squared error (MSE)

  \[ \begin{aligned} &\mathbb{\mathbb{E}}\left[\big(\Theta - \hat\theta\big)^2\right] \\ \end{aligned} \]
Minimized when \(\hat\theta = \mathbb{\mathbb{E}}[\Theta], \text{ so that } \)
  \[ \begin{aligned} &\mathbb{E}\left[\big(\Theta - \hat\theta\big)^2\right] = \mathbb{\mathbb{E}}\left[\big(\Theta - \mathbb{E}[\Theta]\big)^2\right] = \text{var}(\Theta) & \text{least mean square (} \color{blue}{\text{LMS}}) \\ \end{aligned} \]

Conditional mean squared error

  \[ \begin{aligned} &\mathbb{E}\left[\big(\Theta - \hat\theta\big)^2 \,\big|\, \color{blue}{X=x} \right] & \text{with observation }x \\ \end{aligned} \]
Minimized when \(\hat\theta = \mathbb{E}[\Theta\,\big|\, \color{blue}{X=x}], \text{ so that } \)
  \[ \begin{aligned} \mathbb{\mathbb{E}}\left[\big(\Theta - \hat\theta\big)^2\,\big|\, \color{blue}{X=x} \right] &= \mathbb{E}\left[\big(\Theta - \mathbb{E}[\Theta\,|\, \color{blue}{X=x}]\big)^2 \,\big|\, \color{blue}{X=x} \right] \\ &= \color{red}{\text{var}(\Theta\,|\, X=x)} \qquad \text{expected performance, given a measurement} \\ \end{aligned} \]
Expected performance of the design:
  \[ \begin{aligned} \mathbb{E}\left[\big(\Theta - \mathbb{E}[\Theta\,|\, \color{blue}{X}]\big)^2 \right] &= \color{red}{\mathbb{E}\left[\text{var}(\Theta\,|\,\ X)\right]} \\ \end{aligned} \]
Note that \(\hat\theta\) is an estimate whereas \(\widehat\Theta = \mathbb{E}[\Theta\,|\,X]\) is an estimator.

Linear least mean square (LLMS) estimation

\(\quad\) Minimize \(\mathbb{E}\left[(\Theta - aX - b)^2\right]\) w.r.t. \(a, b\)
  \[ \begin{aligned} \widehat\Theta_L &= \mathbb{E}[\Theta] + \frac{cov(\Theta,X)}{var(X)}\big(X - \mathbb{E}[X]\big)\\ &= \mathbb{E}[\Theta] + \rho\frac{\sigma_\Theta}{\sigma_X}\big(X - \mathbb{E}[X]\big) & \text{only means, variances and covariances matter} \\ \end{aligned} \]
\(\quad\)Error variance:
  \[ \begin{aligned} \mathbb{E}[(\widehat\Theta_L - \Theta)^2] &= (1-\rho^2) \cdot var(\Theta) \\ \end{aligned} \]

Source: MITx 6.041x, Lecture 16, 17.


Friday, April 04, 2014

 

Variance

  \[ \begin{aligned} var(X) &= \mathbb{E}\left[(X - \mathbb{E}[X])^2\right] \\ &= \mathbb{E}\left[X^2\right] - \big(\mathbb{E}[X]\big)^2 \\ var(X\,|\,Y=y) &= \mathbb{E}\left[\left(X - \mathbb{E}[X\,|\,Y=y]\right)^2\,|\,Y=y\right] \\ \end{aligned} \]

Law of total variance

  \[ \begin{aligned} var(X) &= \mathbb{E}\left[var(X\,|\,Y)\right] + var\big(\mathbb{E}[X\,|\,Y]\big) \\ var(X_1 + \cdots + X_N) &= \mathbb{E}\left[var(X_1 + \cdots + X_N\,|\,N)\right] + var\big(\mathbb{E}[X_1 + \cdots + X_N \,|\,N]\big) \\ &= \mathbb{E}[N] \cdot var(X) + \big(\mathbb{E}[X]\big)^2\cdot var(N) \\ \end{aligned} \]

Covaraince

  \[ \begin{aligned} cov(X,Y) &= \mathbb{E}\left[(X - \mathbb{E}[X])\cdot(Y - \mathbb{E}[Y])\right] \\ &= \mathbb{E}\left[XY\right] - \mathbb{E}[X]\cdot \mathbb{E}[Y] \\ cov(aX+b,Y) &= a\cdot cov(X,Y) \\ cov(X,Y+Z) &= cov(X,Y) + cov(X,Z) \\ \end{aligned} \]

Correlation coefficient

  \[ \begin{aligned} \rho(X,Y) &= \frac{cov(X,Y)}{\sigma_X \sigma_Y} & -1 \le \rho \le 1\\ \end{aligned} \]

Sum of random variables

  \[ \begin{aligned} var(X_1 + X_2) &= var(X_1) + var(X_2) + 2cov(X_1,X_2) \\ var(X_1 - X_2) &= var(X_1) + var(X_2) - 2cov(X_1,X_2) \\ \end{aligned} \]

Source: MITx 6.041x, Lecture 12, 13.


 

Expectation

  \[ \begin{aligned} \mathbb{E}[X] &= \sum_x x\cdot p_X(x) \qquad\text{or} \quad \int x\cdot f_X(x)\,dx \\ \mathbb{E}[X\,|\,A] &= \sum_x x\cdot p_{X|A}(x) \qquad\text{or} \quad \int x\cdot f_{X|A}(x)\,dx \\ g(y) &= \mathbb{E}[X\,|\,Y=y] = \sum_x x\cdot p_{X|Y}(x\,|\,y) \qquad\text{or} \quad \int x\cdot f_{X|Y}(x\,|\,y)\,dx \\ g(Y) &= \mathbb{E}[X\,|\,Y] & \text{conditional expectation as r.v.} \\ \end{aligned} \]

Expected value rule

  \[ \begin{aligned} \mathbb{E}[g(X)] &= \sum_x g(x)\cdot p_X(x) \qquad\text{or} \quad \int g(x)\cdot f_X(x)\,dx \\ \mathbb{E}[g(X)\,|\,A] &= \sum_x g(x)\cdot p_{X|A}(x) \qquad\text{or} \quad \int g(x)\cdot f_{X|A}(x)\,dx \\ \mathbb{E}[g(X,Y)] &= \sum_x\sum_y g(x,y)\cdot p_{X,Y}(x,y) \qquad\text{or} \quad \int\int g(x,y)\cdot f_{X,Y}(x,y)\,dx\,dy \\ \end{aligned} \]

Total probability and expectation theorems

  \[ \begin{aligned} \mathbb{P}(B) &= \color{blue}{\mathbb{P}(A_1)}\cdot \mathbb{P}(B\,|\,A_1) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot \mathbb{P}(B\,|\,A_n) \\ p_X(x) &= \color{blue}{\mathbb{P}(A_1)}\cdot p_{X|A_1}(x) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot p_{X|A_n}(x) \\ \\ F_X(x) &= \mathbb{P}(X \le x) \\ &= \color{blue}{\mathbb{P}(A_1)}\cdot \mathbb{P}(X \le\,|\,A_1) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot \mathbb{P}(X \le x\,|\,A_n) \\ &= \color{blue}{\mathbb{P}(A_1)}\cdot F_{X|A_1}(x) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot F_{X|A_n}(x) \\ \\ f_X(x) &= \color{blue}{\mathbb{P}(A_1)}\cdot f_{X|A_1}(x) + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot f_{X|A_n}(x) \\ \int x\cdot f_X(x)\,dx &= \color{blue}{\mathbb{P}(A_1)}\int x\cdot f_{X|A_1}(x)\,dx + \cdots + \color{blue}{\mathbb{P}(A_n)}\int x\cdot f_{X|A_n}(x)\,dx \\ \mathbb{E}[X] &= \color{blue}{\mathbb{P}(A_1)}\cdot \mathbb{E}[X\,|\,A_1] + \cdots + \color{blue}{\mathbb{P}(A_n)}\cdot \mathbb{E}[X\,|\,A_n] \\ \mathbb{E}[X_1 + \cdots + X_N] &= \sum_n p_N(n)\cdot \mathbb{E}\big[X_1 + \cdots + X_N \,\big|\, N=n \big] \\ & = \color{red}{\sum_n p_N(n)\cdot n \cdot \mathbb{E}[X]} = \mathbb{E}[N]\cdot\mathbb{E}[X] \\ \end{aligned} \]

Linearity of expectations

  \[ \begin{aligned} \mathbb{E}[aX+b] &= a\mathbb{E}[X] + b \\ \mathbb{E}[X + Y] &= \mathbb{E}[X] + \mathbb{E}[Y] \\ \end{aligned} \]

Law of iterated expectations

  \[ \begin{aligned} \mathbb{E}\left[\mathbb{E}[X\,|\,Y]\right] &= \sum_y \mathbb{E}[X\,|\,Y=y] \cdot p_Y(y) = \mathbb{E}[X] \\ \mathbb{E}\left[\mathbb{E}[X_1 + \cdots + X_N\,|\,N]\right] &= \color{red}{\mathbb{E}\big[N\cdot\mathbb{E}[X]\big]} = \mathbb{E}[N]\cdot\mathbb{E}[X] \\ \end{aligned} \]

Source: MITx 6.041x, Lecture 9, 13.


This page is powered by Blogger. Isn't yours?