The Weak Law of Large Numbers says that, for any sequence X1,X2,... of i.i.d. random variables with finite mean E[X1] = µ and finite variance Var[X1] = σ2, the sample averages
converge to µ in probability. The proof is a simple application of Chebyshev’s inequality. But, in fact, much more can be said:
Theorem 1 (Strong Law of Large Numbers). Under the above assumptions, the sequence of the sample means converges to µ almost surely.
Before we prove this theorem, let us try to understand its meaning. Consider the act of repeatedly tossing a fair coin, such that each toss is independent of all tosses before it. The underlying probability space can be constructed as follows. Let Ω be the set of all one-sided infinite sequences ω = (ω1,ω2,...) such that ωi ∈{H,T} for all i; let F be the smallest σ-field containing all sets of the form
E(k;i1,...,ik;w1,...,wk) = {ω ∈ Ω : ωi1 = w1,...,ωik = wk} (1)
for all k ∈N, all choices 1 ≤ i1 < ... < ik of k positive integers, and all choices of w1,...,wk ∈{H,T}. In other words, F is the smallest σ-algebra containing all events determined by the outcomes of any finite number of tosses. Finally, let the probability measure P assign probability 2−k to each event of the form (1); this uniquely defines P on all of F. Now define the following sequence of random variables:
We claim that X1,X2,... are i.i.d. Bernoulli(1/2) random variables. To see this, note that for any k ∈N, 1 ≤ i1 < ... < ik, and w1,...,wk ∈{H,T} we have
In particular, P(Xn = 1) = P(Xn = 0) = 1/2 for each n, and
P (Xi1 = w1,...,Xik = wk) = P (E(k;i1,...,ik;w1,...,wk))
(Exercise: repeat this construction for a biased coin with bias p.) Consider now the following events:
(Exercise: show that L− and L+ are, indeed, events, i.e., elements of F.) If ω ∈ L−, then for any ε > 0 there exists some N− = N−(ε), such that
(note that the sequence
is monotone increasing, and bounded, so it has a limit). Similarly, if ω ∈ L+, then for any ε > 0 there exists some N+ = N+(ε), such that
(note that the sequence
is monotone decreasing, and bounded, so it has a limit). If ω ∈ L−∩L+, then for all n ≥ max{N−,N+} we will have
where, for instance,
is the largest possible fraction of heads in any sequence of n or more tosses. In other words, if ω ∈ L−∩L+, then, provided you toss the coin enough times, the fraction of heads will stay arbitrarily close to 1/2. Now, another way of writing down the definitions of L− and L+ is
(consult the course notes for the definitions of liminf and limsup). Since a sequence {an} of real numbers converges if and only if lim in fn an = limsupn an, we have
Using this and the Strong Law of Large Numbers, we see that P(L−∩L+) = 1. Therefore, almost sure convergence of the sample means to 1/2 is a statement about long-term stability of the patterns of H’s and T’s in sufficiently long sequences of independent tosses of a fair coin: provided you keep tossing the coin long enough, you are all but assured to see the fraction of heads settling down to its expected value, namely 1/2.
Proof of the Strong Law for bounded random variables
We will prove Theorem 1 under an additional assumption that the variables X1,X2,... are bounded with probability one, i.e., there exist some −∞ < a ≤ b < ∞, such that P(a ≤ X1 ≤ b) = 1. Our strategy will be as follows: We will first show that, for any ε > 0
Using this and the Borel–Cantelli lemma, we will then conclude that In fact, we will prove a lot more than just ( 3): we will also show that the probability that
decays exponentially fast with n. At the heart of the proof lies a very useful bounding technique, which is typically referred to as the Chernoff–Hoeffding technique (although it seems to go back at least to the work of S.N. Bernstein in 1927). The idea is as follows. Consider a random variable Z, and suppose that we wish to bound the probability that Z ≥ t for some t > 0. Observe that for any s > 0 we have
where the first step uses the monotonicity of the exponential function, and the second step uses Markov’s inequality. Observe, by the way, that is the moment generating function ΨZ(s). The trick is to choose an s > 0 that would make the right-hand side of (4) suitably small. In fact, since (4) holds simultaneously for all s > 0, the best thing to do would be to minimize the right-hand side over all such s:
However, a good upper bound on the moment generating function ΨZ is often sufficient. One such bound was derived by Hoeffding for the case when Z is bounded with probability 1:
Lemma 1 (Hoeffding). Let Z be a random variable that satisfies P(a ≤ Z ≤ b) = 1 for some −∞ < a ≤ b < ∞. Then
The proof of this lemma uses convexity, and can be found in a variety of places. Let us center the random variables {Xi} by defining Zi = Xi −µ for all i.Then
where in the second line we have used the Chernoff–Hoeffding trick, and the last two steps use the fact that Z1,Z2,... are i.i.d. Since X1 ∈ [a,b] with probability one, Z1 ∈ [a−µ,b−µ] with probability one, and E[Z1] = 0. Thus, applying Lemma 1, we get
Consequently,
If we optimize the right-hand side of (6) with respect to s, we get
Applying the same ideas to −, we get
Therefore, using the union bound together with (7) and (8) gives us
From this, we immediately get the Weak Law (but with a much better estimate of the rate of convergence than what Chebyshev’s inequality would give you); moreover, because (3) holds, the sequence converges to µ a.s., so we get the Strong Law as well.
Back to coin tossing. If we return to our coin tossing example, we can now answer the following question: how many times do you need to toss a coin with unknown bias p to guarantee that the fraction of heads is within ε > 0 of p with probability at least 1−δ? The answer is simple: using (9) with a = 0 and b = 1, we get
If we want the right-hand side to be no more than δ, then
tosses will suffice. Let’s compare this with the bound you get from Chebyshev’s inequality. Since we don’t know the bias p ahead of time, we cannot use this information, but we can always upper-bound the variance of
The lower bound we get using the Chernoff–Hoeffding method is much better!
556 videos|198 docs
|
1. What is the weak law of large numbers? |
2. What is the strong law of large numbers? |
3. How are the weak and strong laws of large numbers different? |
4. What is the significance of the laws of large numbers? |
5. Can you give an example demonstrating the weak law of large numbers? |
556 videos|198 docs
|
|
Explore Courses for Mathematics exam
|