Processing math: 58%
Google
 
Web unafbapune.blogspot.com

Sunday, March 30, 2014

 

Normal Distribution

XN(μ,σ2)fX(x)=1σ2πe(xμ)2/2σ2

In fact, any PDF of the following form with α>0 is a normal distribution:
  fX(x)=ce(αx2+βx+γ)
Note fX(x) is at the peak when x=μ. It's therefore not difficult to show (via minimizing the exponent) that:
  μ=β2ασ2=12α
Posterior probability:
  fΘ|X(θ|x)=fΘ(θ)fX|Θ(x|θ)fX(x)fX(x)=fΘ(θ)fX|Θ(x|θ)dθfΘ(θ): prior distributionfΘ|X(θ|x): posteriori distribution, output of Bayesian inference
Given the prior distribution of Θ and some observations of X, we can express the posteriori distribution of Θ (as a function of Θ) and do a point estimation. Normal distribution is particularly nice as the point estimation can be easily done by finding the peak of the posteriori distribution, which translates to simply finding the minima of the exponent as a quadratic function of Θ via differentiation.

Estimate with single observation

  X=Θ+WΘ,W:N(0,1)independent Θ,WˆΘMAP=ˆΘLMS=E[Θ|X]=X2E[(ΘˆΘ)2|X=x]=12ˆΘ: (point) estimator - a random variable ˆθ: estimate - a number MAP: Maximum a posteriori probability LMS: Least Mean Squares

Estimate with multiple observations

  X1=Θ+W1ΘN(x0,σ20)WiN(0,σ2i)Xn=Θ+WnΘ,W1,Wn independentfΘ|X(θ|x)=cequad(θ)quad(θ)=(θx0)22σ20+(θx1)22σ22++(θxn)22σ2nˆΘMAP=ˆΘLMS=E[Θ|X]=1ni=01σ2ini=0xiσ2iE[(ΘˆΘ)2]=E[(ΘˆΘ)2|X=x]=E[(Θˆθ)2|X=x]var(Θ)=var(Θ|X=x)=1ni=01σ2imean squared error

Θ as a m-dimensional vector with n observations

  fΘ|X(θ|x)=1fX(x)mj=1fΘj(θj)ni=1fXi(xi)posteriori distribution
As normal distribution, we can then differentiate quad(Θ) per Θj and set the derivatives to zero, solving m linear equations with m unknowns for the point estimate of Θ.

Source: MITx 6.041x, Lecture 15.


 

Independence

Probabilistic models that do not interact with each other and have no common sources of uncertainty.
  P(AB)=P(A)P(B)iff A and B are independentpX|A(x)=pX(x)for all x iff X and A are independentpX,Y(x,y)=pX(x)pY(y)for all x,y iff X and Y are independentpX,Y,Z(x,y,z)=pX(x)pY(y)pZ(z)for all x,y,z iff X,Y and Z are independent
Note it's always true that
  fX,Y(x,y)=fX|Y(x|y)fY(y)by conditional proability
But
  fX|Y(x|y)fY(y)=fX(x)fY(y)iff X,Y are independent for all x,y

Expectation

In general,
  E[g(x,y)]g(E[x],E[y])eg E[XY]E[X]E[Y]
It's however always true that
  E[aX+b]=aE[X]+bLinearity of Expectation
But if X and Y are independent, then
  E[XY]=E[X]E[Y] and E[g(X)h(Y)]=E[g(X)]E[h(Y)]

Variance

In general,
  var(X+Y)var(X)+var(Y)
It's however always true that
  var(aX)=a2var(X)andvar(X+a)=var(X)
But if X and Y are independent, then
  var(X+Y)=var(X)+var(Y)

Source: MITx 6.041x, Lecture 7.


Thursday, March 27, 2014

 

Random Variables

Uniform from a to b

Discrete:
  pX(x)=1ba+1E[X]=a+b2var(X)=112(ba)(ba+2)P(axb)=axbpX(x)
Continuous:
  fX(x)=1baE[X]=a+b2var(X)=(ba)212P(axb)=bafX(x)dx

Bernoulli with parameter p[0,1]

  pX(0)=1ppX(1)=pE[X]=pvar(X)=pp214(max variance)

Binomial with parameter p[0,1]

Model number of successes (k) in a given number of independent trials (n):
  \begin{aligned} p_X(k) &= \mathbb{P}(X=k) = {n \choose k}p^k(1-p)^{n-k} \\ \mathbb{E}[X] &= n\cdot \color{blue}{p} \quad var(X) = n\cdot\color{blue}{(p - p^2)} \\ \end{aligned}

Poisson with parameter p \in [0, 1]

Large n, small p, moderate \lambda = np which is the arrival rate. Model number of arrivals S:
  \begin{aligned} p_S(k) &\to \frac{\lambda^k}{k!}e^{-\lambda} \qquad \mathbf{E}(S) = \lambda \qquad \text{var}(S) = \lambda \\ \end{aligned}

Beta with parameters (\alpha, \beta)

Infer the posterior unknown bias \Theta of a coin with k number of heads in n (fixed) tosses:
  \begin{aligned} f_{\Theta|K}(\theta\,|\,k) &= \frac{1}{d(n,k)} \theta^k (1-\theta)^{n-k} \\ \int_0^1 \theta^\alpha (1-\theta)^\beta\,d\theta &= \frac{\alpha! \, \beta!}{(\alpha+\beta+1)!} & \text{beta distribution} \end{aligned}

Geometric with parameter p \in [0, 1]

Model number of trials (k) until a success:
  \begin{aligned} p_X(k) = \mathbb{P}(X = k) = (1-p)^{k-1}p \quad \mathbb{E}[X] = \frac{1}{p} \quad var(X) = \frac{1-p}{p^2} \\ \end{aligned}

Exponential with parameter \lambda > 0

Model amount of time elapsed (x) until a success:
  \begin{aligned} f_X(x) &= \lambda e^{-\lambda x} \quad \mathbb{E}[X] = \frac{1}{\lambda} \quad var(X) = \frac{1}{\lambda^2} \\ \mathbb{P}(X > a) &= \int_a^\infty \lambda e^{-\lambda x} \, dx = e^{-\lambda a} \\ \mathbb{P}(T - t > x\, |\, T > t) &= e^{-\lambda x} = \mathbb{P}(T > x) & \text{Memorylessness!} \\ \mathbb{P}(0 \le T \le \delta) &\approx \lambda\delta \approx \mathbb{P}(t \le T \le t+\delta\,|\, T > t) & \mathbb{P}(\text{(success}) \text{ at every }\delta \text{ time step } \approx \lambda\delta\\ \end{aligned}

Normal (Gaussian)

  \begin{aligned} N(0,1): f_X(x) &= \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \\ N(\mu,\sigma^2): f_X(x) &= \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/2\sigma^2} \\ \end{aligned}
\qquadIf X \thicksim N(\mu,\sigma^2) and Y = aX + b, then Y \thicksim N(a\mu + b,a^2\sigma^2)

\qquadIf X = \Theta + W where W \thicksim N(0,\sigma^2), indep. of \Theta, then f_{X|\Theta}(x\,|\,\theta) = f_W(x-\theta); or X \thicksim N(\Theta,\sigma^2)

Cumulative distributive function (CDF)

Discrete:
  \begin{aligned} F_X(x) = \mathbb{P}(X \le x) = \sum_{k \le x} p_X(k) \\ \end{aligned}
Continuous:
  \begin{aligned} F_X(x) = \mathbb{P}(X \le x) = \int_{-\infty}^x f_X(t)\,dt \quad \therefore \frac{d}{dx}F_X(x) = f_X(x) \\ \end{aligned}

Source: MITx 6.041x, Lecture 5, 6, 8, 14.


This page is powered by Blogger. Isn't yours?