The entropy \(H(x)\) of a random variable \(X\) is defined as
  |
\[
\begin{aligned}
\color{blue}{H(X) = - \sum_x p(x) \log{p(x)}}
\end{aligned}
\] |
which measures the uncertainty of \(X\). Note \(H(X)\) depends only on the distribution of \(X\) but not on the actual values taken by \(X\). For example, here is a cute
binary entropy function for \(X \thicksim \{\gamma, 1-\gamma\}\):
  |
\[
\begin{aligned}
h_b(\gamma) = - \gamma \log{\gamma} - (1-\gamma) \log{(1-\gamma}) \\
\end{aligned}
\] |
Via the use of expectation:
  |
\[
\begin{aligned}
E g(X) = \sum_x p(x) g(x) \\
\end{aligned}
\] |
Entropy can also be expressed compactly in terms of expectation:
  |
\[
\begin{aligned}
\color{blue}{H(X) = - E \log{p(X)}} = - \sum_x p(x) \log{p(x)}
\end{aligned}
\] |
Interestingly, the
joint entropy:
  |
\[
\begin{aligned}
H(X,Y) = - E \log{p(X,Y)} = - \sum_{x,y} p(x,y) \log{p(x,y)} \\
\end{aligned}
\] |
But the
conditional entropy of \(Y\) given \(X\):
  |
\[
\begin{aligned}
\color{blue}{H(Y\,|\,X) = - E \log{p(Y\,|\,X)}} &= - \sum_{x,y} p(x,y) \log{p(y\,|\,x)} \\
&= - \sum_x \sum_y p(x) p(y\,|\,x) \log{p(y\,|\,x)} \\
&= \sum_x p(x) \left[-\sum_y p(y\,|\,x) \log{p(y\,|\,x)}\right] \\
&= \color{blue}{\sum_x p(x)\, H(Y\,|\,X=x)} \\
\end{aligned}
\] |
Similarly,
  |
\[
\begin{aligned}
H(Y\,|\,X,Z) = \sum_z p(z)\, H(Y\,|\,X,Z=z) \\
\end{aligned}
\] |
where
  |
\[
\begin{aligned}
H(Y\,|\,X,Z=z) = -\sum_{x,y} p(x,y\,|\,z) \log{p(y\,|\,x,z)} \\
\end{aligned}
\] |
Now it's not hard to show that
  |
\[
\begin{aligned}
\color{blue}{H(X,Y) = H(X) + H(Y\,|\,X)} \\
\end{aligned}
\] |
for
  |
\[
\begin{aligned}
H(X,Y) &= -E \log{p(X,Y)} = -E \log{\left[p(X) \, p(Y\,|\,X)\right]} \\
&= -E \log{p(X)} - E \log{p(Y\,|\,X)} \\
&= H(X) + H(Y|X) \\
\end{aligned}
\] |
In general, there is the Chain rule for Entropy:
  |
\[
\begin{aligned}
\color{blue}{H(X_1,X_2, \cdots, X_n)} &\color{blue}{= \sum_{i=1}^n H(X_i|X_1, \cdots , X_{i-1})} \\
\end{aligned}
\] |
which can be easily verified via induction. The base case is known to be true above. Suppose it is true for \(n\), then
  |
\[
\begin{aligned}
H(X_1,X_2, \cdots, X_n, X_{n+1}) &= H(X_{n+1}| X_1, \cdots , X_n) + H(X_1, \cdots , X_n) \\
&= H(X_{n+1}| X_1, \cdots , X_n) + \sum_{i=1}^{n} H(X_i|X_1, \cdots , X_{i-1}) & \text{induction hypothesis} \\
&= \sum_{i=1}^{n+1} H(X_i|X_1, \cdots , X_{i-1}) \\
\end{aligned}
\] |
\(\Box\)
Source: Information Theory.
# posted by rot13(Unafba Pune) @ 3:37 PM