Posted by: atri | February 26, 2009

Lecture 17: Converse of Shannon’s theorem

Today we stared with the proof  of the converse of Shannon’s theorem and then moved on to the proof of the “positive” part of Shannon’s theorem. The relevant lectures notes from Fall 07 are Lecture 9 and Lecture 10 respectively. (I’ll polish the latter notes by this weekend.)

In class today, I stumbled during the proof of a claim, which I promised to prove on the blog. The claim was that |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\ge \frac{1}{4}\cdot |S_{\mathbf{m}}|. It turns out that I can prove the following weaker claim, which nonethless is sufficient for the proof: |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\ge \frac{1}{4}\cdot 2^{-\beta n}\cdot 2^{nH(p)}, for some small enough \beta>0. (We’ll also have to consider a “tighter” shell than the one we defined in class today.) The notes for Lecture 9 from Fall 07 has been updated with the correct proof.

Thanks to Luca Trevisan‘s latex2wp program, I’m also reproducing the proof of the converse of Shannon’s theorem after the fold. (Please let me know if you find any bugs!)

1. Converse of Shannon’s Capacity Theorem for BSC

Theorem 1 Let {0\le p<1/2} and {\epsilon>0}. If {k\ge \lceil (1-H(p)+\epsilon)n\rceil} then for every {E:\{0,1\}^k\rightarrow \{0,1\}^n} and {D:\{0,1\}^n\rightarrow \{0,1\}^k}, for some {\mathbf{m}\in\{0,1\}^k},

\displaystyle Pr_{\mathbf{e} \text{ from } BSC_p}[D(E(\mathbf{m})+\mathbf{e})\neq \mathbf{m}]>\frac{1}{2}.

Proof: First, we note that there is nothing to prove if {p=0}, so for the rest of the proof we will assume that {p>0}. For the sake of contradiction, assume that the following holds for every {\mathbf{m}\in\{0,1\}^k}:

\displaystyle \underset{\mathbf{e} \text{ from } BSC_p}{Pr} \left[ D(E(\mathbf{m})+\mathbf{e}) \neq \mathbf{m} \right] \leq 1/2.

Fix an arbitrary message {\mathbf{m} \in \{ 0, 1\} ^k}. Define {D_{\mathbf{m}}} to be the set of received words that are decoded to {\mathbf{m}} by {D}, that is,

\displaystyle D_{\mathbf{m}} = \{ \mathbf{y} | D(\mathbf{y}) = \mathbf{m}\}.

Note that by our assumption, the following is true (where from now on we omit the explicit dependence of the probability on the {BSC_p} noise for clarity):

\displaystyle   Pr \left[ E(\mathbf{m})+\mathbf{e}\not\in D_{\mathbf{m}} \right] \le 1/2. \ \ \ \ \ (1)

Further, by the Chernoff bound,

\displaystyle   Pr[E(\mathbf{m})+\mathbf{e} \not\in {S_{\mathbf{m}}}]\le 2^{ - \Omega({\gamma}^2 n) }, \ \ \ \ \ (2)

where {S_{\mathbf{m}}} is the shell of radius {[(1-\gamma)pn, (1 + \gamma)pn]} around {E(\mathbf{m})}, that is, {{S}=B_2(E(\mathbf{m}),(1+\gamma)pn)\setminus B_2(E(\mathbf{m}),(1-\gamma)pn)}. (We will set {\gamma>0} in terms of {\epsilon} and {p} at the end of the proof.)

(1) and (2) along with the union bound imply the following:

\displaystyle   Pr \left[ E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}} \right] \geq \frac{1}{2} - 2^{ - \Omega({\gamma}^2 n) }\ge \frac{1}{4}, \ \ \ \ \ (3)

where the last inequality holds for large enough {n}. Next we upper bound the probability above to obtain a lower bound on {|D_{\mathbf{m}}\cap S_{\mathbf{m}}|}.

It is easy to see that

\displaystyle Pr \left[ E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}} \right] \le |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\cdot p_{max},

where

\displaystyle p_{max}=\max_{\mathbf{y}\in S_{\mathbf{m}}} Pr[E(\mathbf{m})+\mathbf{e}=\mathbf{y}] =\max_{d\in[(1-\gamma)pn,(1+\gamma)pn]} p^{d}(1-p)^{n-d}.

It is easy to check that {p^d(1-p)^{n-d}} is decreasing in {d} for {p\le 1/2}. Thus, we have

\displaystyle p_{max}=p^{(1-\gamma)pn}(1-p)^{n-(1-\gamma)pn}=\left(\frac{1-p}{p}\right)^{\gamma pn}\cdot p^{pn}(1-p)^{(1-p)n}= \left(\frac{1-p}{p}\right)^{\gamma pn} 2^{-nH(p)}.

Thus, we have shown that

\displaystyle Pr \left[ E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}} \right] \le |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\cdot \left(\frac{1-p}{p}\right)^{\gamma pn} 2^{-nH(p)},

which by (3) implies that

\displaystyle   |D_{\mathbf{m}} \cap S| \geq \frac{1}{4}\cdot \left(\frac{1-p}{p}\right)^{-\gamma pn} 2^{nH(p)} . \ \ \ \ \ (4)

Next, we consider the following sequence of relations:

\displaystyle   2^n = \sum_{\mathbf{m} \in \{ 0, 1\}^k} |D_{\mathbf{m}}| \ \ \ \ \ (5)

\displaystyle 	 \geq \sum_{\mathbf{m} \in \{ 0, 1\}^k} |D_{\mathbf{m}} \cap S| \nonumber

\displaystyle   	 \geq \frac{1}{4}\left(\frac{1-p}{p}\right)^{-\gamma pn} \sum_{\mathbf{m} \in \{ 0, 1\}^k} 2^{H(p)n} \ \ \ \ \ (6)

\displaystyle  	 = 2^{k-2} 2^{H(p)n -\gamma p\log(1/p-1)n} \nonumber

\displaystyle   	 > 2^{k+ H(p)n -\epsilon n}. \ \ \ \ \ (7)

In the above (5) follows from the fact that for {\mathbf{m}_1\neq\mathbf{m}_2}, {D_{\mathbf{m}_1}} and {D_{\mathbf{m}_2}} are disjoint. (6) follows from (4). (7) follows for large enough {n} and if we pick {\gamma=\frac{\epsilon}{2p\log\left(\frac{1}{p}-1\right)}}. (Note that as {0<p<1/2}, {\gamma=\Theta(\epsilon)}.)

(7) implies that {k < (1 - H(p)+\epsilon )n}, which is a contradiction. The proof is complete. \Box

Remark 1 It can be verified that the proof above can also work if the decoding error probability is bounded by {2^{-\beta n}} (instead of the {1/2} in the theorem) for small enough {\beta=\beta(\epsilon)>0}.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: