The entropy H(x) of a random variable X is defined as
which measures the uncertainty of
X. Note
H(X) depends only on the distribution of
X but not on the actual values taken by
X. For example, here is a cute
binary entropy function for
X∼{γ,1−γ}:
|
hb(γ)=−γlogγ−(1−γ)log(1−γ) |
Via the use of expectation:
Entropy can also be expressed compactly in terms of expectation:
|
H(X)=−Elogp(X)=−∑xp(x)logp(x) |
Interestingly, the
joint entropy:
|
H(X,Y)=−Elogp(X,Y)=−∑x,yp(x,y)logp(x,y) |
But the
conditional entropy of
Y given
X:
|
H(Y|X)=−Elogp(Y|X)=−∑x,yp(x,y)logp(y|x)=−∑x∑yp(x)p(y|x)logp(y|x)=∑xp(x)[−∑yp(y|x)logp(y|x)]=∑xp(x)H(Y|X=x) |
Similarly,
|
H(Y|X,Z)=∑zp(z)H(Y|X,Z=z) |
where
|
H(Y|X,Z=z)=−∑x,yp(x,y|z)logp(y|x,z) |
Now it's not hard to show that
for
|
H(X,Y)=−Elogp(X,Y)=−Elog[p(X)p(Y|X)]=−Elogp(X)−Elogp(Y|X)=H(X)+H(Y|X) |
In general, there is the Chain rule for Entropy:
|
H(X1,X2,⋯,Xn)=n∑i=1H(Xi|X1,⋯,Xi−1) |
which can be easily verified via induction. The base case is known to be true above. Suppose it is true for
n, then
|
H(X1,X2,⋯,Xn,Xn+1)=H(Xn+1|X1,⋯,Xn)+H(X1,⋯,Xn)=H(Xn+1|X1,⋯,Xn)+n∑i=1H(Xi|X1,⋯,Xi−1)induction hypothesis=n+1∑i=1H(Xi|X1,⋯,Xi−1) |
◻
Source: Information Theory.
# posted by rot13(Unafba Pune) @ 3:37 PM
