Google
 
Web unafbapune.blogspot.com

Saturday, January 11, 2014

 

Entropy

The entropy \(H(x)\) of a random variable \(X\) is defined as
  \[ \begin{aligned} \color{blue}{H(X) = - \sum_x p(x) \log{p(x)}} \end{aligned} \]
which measures the uncertainty of \(X\). Note \(H(X)\) depends only on the distribution of \(X\) but not on the actual values taken by \(X\). For example, here is a cute binary entropy function for \(X \thicksim \{\gamma, 1-\gamma\}\):
  \[ \begin{aligned} h_b(\gamma) = - \gamma \log{\gamma} - (1-\gamma) \log{(1-\gamma}) \\ \end{aligned} \]
Via the use of expectation:
  \[ \begin{aligned} E g(X) = \sum_x p(x) g(x) \\ \end{aligned} \]
Entropy can also be expressed compactly in terms of expectation:
  \[ \begin{aligned} \color{blue}{H(X) = - E \log{p(X)}} = - \sum_x p(x) \log{p(x)} \end{aligned} \]
Interestingly, the joint entropy:
  \[ \begin{aligned} H(X,Y) = - E \log{p(X,Y)} = - \sum_{x,y} p(x,y) \log{p(x,y)} \\ \end{aligned} \]
But the conditional entropy of \(Y\) given \(X\):
  \[ \begin{aligned} \color{blue}{H(Y\,|\,X) = - E \log{p(Y\,|\,X)}} &= - \sum_{x,y} p(x,y) \log{p(y\,|\,x)} \\ &= - \sum_x \sum_y p(x) p(y\,|\,x) \log{p(y\,|\,x)} \\ &= \sum_x p(x) \left[-\sum_y p(y\,|\,x) \log{p(y\,|\,x)}\right] \\ &= \color{blue}{\sum_x p(x)\, H(Y\,|\,X=x)} \\ \end{aligned} \]
Similarly,
  \[ \begin{aligned} H(Y\,|\,X,Z) = \sum_z p(z)\, H(Y\,|\,X,Z=z) \\ \end{aligned} \]
where
  \[ \begin{aligned} H(Y\,|\,X,Z=z) = -\sum_{x,y} p(x,y\,|\,z) \log{p(y\,|\,x,z)} \\ \end{aligned} \]
Now it's not hard to show that
  \[ \begin{aligned} \color{blue}{H(X,Y) = H(X) + H(Y\,|\,X)} \\ \end{aligned} \]
for
  \[ \begin{aligned} H(X,Y) &= -E \log{p(X,Y)} = -E \log{\left[p(X) \, p(Y\,|\,X)\right]} \\ &= -E \log{p(X)} - E \log{p(Y\,|\,X)} \\ &= H(X) + H(Y|X) \\ \end{aligned} \]
In general, there is the Chain rule for Entropy:
  \[ \begin{aligned} \color{blue}{H(X_1,X_2, \cdots, X_n)} &\color{blue}{= \sum_{i=1}^n H(X_i|X_1, \cdots , X_{i-1})} \\ \end{aligned} \]
which can be easily verified via induction. The base case is known to be true above. Suppose it is true for \(n\), then
  \[ \begin{aligned} H(X_1,X_2, \cdots, X_n, X_{n+1}) &= H(X_{n+1}| X_1, \cdots , X_n) + H(X_1, \cdots , X_n) \\ &= H(X_{n+1}| X_1, \cdots , X_n) + \sum_{i=1}^{n} H(X_i|X_1, \cdots , X_{i-1}) & \text{induction hypothesis} \\ &= \sum_{i=1}^{n+1} H(X_i|X_1, \cdots , X_{i-1}) \\ \end{aligned} \]
\(\Box\)

Source: Information Theory.


Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?