Posted by: atri | February 28, 2009

## Lecture 18: Positive part of Shannon’s theorem

In class today, we proved the positive part of Shannon’s capacity theorem modulo the so called Markov or averaging argument.  At the end of this post,  a proof of  the Markov argument in general is presented and in the next lecture, we will see its use in the specific context of Shannon’s proof. Scribed notes for Lecture 10 from Fall 07 contains the material covered today.

I also handed out feedback forms: thanks to everyone who filled in the form. I’ll address the issues that came up in the next post.

Below the fold is the proof of the Markov argument (the proof was typeset by Luca Trevisan‘s latex2wp program).

The “Markov’s argument” or the “averaging argument” is a simple yet pretty effective lemma. Next, we state the lemma in a more general form than we will need and then present its proof.

Lemma 1 (Averaging argument) Let ${\mathbb{D}}$ be a finite set and let ${f:\mathbb{D}\rightarrow \mathbb{R}^{\ge 0}}$ be function, where ${\mathbb{R}^{\ge 0}}$ denotes the set of all non-negative reals. Further, let

$\displaystyle \mathbb{E}_{v}[f(v)]\le A. \ \ \ \ \ (1)$

Then for every real ${0<\epsilon\le 1}$ (which can depend on ${|\mathbb{D}|}$) and any subset ${S\subseteq \mathbb{D}}$ such that

$\displaystyle \min_{v\in S} f(v) > \frac{A}{\epsilon}, \ \ \ \ \ (2)$

we have ${|S|<\epsilon|\mathbb{D}|}$.

Proof: For the sake of contradiction assume that there is a subset ${T\subseteq \mathbb{D}}$ with ${|T|\ge \epsilon|\mathbb{D}|}$ that satisfies (2). Now consider the following sequence of relationships

$\displaystyle \mathbb{E}_v[f(v)]= = \frac{1}{|\mathbb{D}|}\cdot\sum_{v\in T} f(v)+ \frac{1}{|\mathbb{D}|}\cdot\sum_{v\in \mathbb{D}\setminus T} f(v) > \frac{1}{|\mathbb{D}|}\cdot \frac{A}{\epsilon}\cdot |T|\ge A,$

where in the first inequality we have used the fact from (2) that ${f(v)> A/\epsilon}$ for ${v\in T}$ and the fact that ${f(v)\ge 0}$ for ${v\not\in T}$. Thus, we get ${\mathbb{E}_v[f(v)]>A}$, which contradicts (1). $\Box$

We will actually use the following corollary of the above lemma in our random coding with expurgation argument:

Lemma 2 Let ${\mathbb{D}}$ be a finite set and let ${f:\mathbb{D}\rightarrow \mathbb{R}^{\ge 0}}$ be function such that ${\mathbb{E}_v[f(v)]\le A}$. Let ${S_{\epsilon}\subseteq \mathbb{D}}$ be the set of ${v\in\mathbb{D}}$ with the ${(1-\epsilon)|\mathbb{D}|}$ smallest ${f(v)}$ values. Then,

$\displaystyle \max_{v\in S_{\epsilon}} f(v)\le \frac{A}{\epsilon}.$

Proof: If there exists a ${w\in S_{\epsilon}}$ such that ${f(w)>A/\epsilon}$, then ${(\mathbb{D}\setminus S_{\epsilon})\cup \{w\}}$ violates Lemma 1. $\Box$

In the Shannon’s proof, we will use the lemma in the following way: ${\mathbb{D}}$ will be the set of all messages ${\{0,1\}^k}$ and ${f(\mathbf{m})}$ for any ${\mathbf{m}\in \{0,1\}^k}$ will denote the decoding error probability for ${\mathbf{m}}$.

Remark 1 The lemmas above have been stated for finite ${\mathbb{D}}$ and (implicitly) uniform distribution over elements in ${\mathbb{D}}$. One can easily generalize them to any distribution ${\mu}$ over (possibly non-finite) sets ${\mathbb{D}}$: the only difference is that the “size” of a subset ${S\subseteq \mathbb{D}}$ or ${|S|}$ will be replaced by ${\mu(S)}$, or the total probability mass of ${S}$ under ${\mu}$.