On the Borel-Cantelli Lemmas , the Erdős-Rényi Theorem , and the Kochen-Stone Theorem

In this paper we present a quantitative analysis of the first and second BorelCantelli Lemmas and of two of their generalisations: the Erdős-Rényi Theorem, and the Kochen-Stone Theorem. We will see that the first three results have direct quantitative formulations, giving an explicit relationship between quantitative formulations of the assumptions and the conclusion. For the Kochen-Stone theorem, however, we can show that the numerical bounds of a direct quantitative formulation are not computable in general. Nonetheless, we obtain a quantitative formulation of the Kochen-Stone Theorem using Tao’s notion of metastability.


Introduction
be an infinite sequence of events in a probability space (S, E, P). The Borel-Cantelli Lemma is a classical result in probability theory, relating the convergence or divergence of the sum ∞ i=1 P[A i ] with the probability of the event "A i infinitely often", which is defined as follows ie ω ∈ S happens infinitely often in (A i ) ∞ i=1 if for all n there exists an i ≥ n such that ω ∈ A i . Note that, as i≥1 A i ⊇ i≥2 A i ⊇ . . ., we have:

Rob Arthan and Paulo Oliva
The Borel-Cantelli Lemma (see, for example, Feller [5]) is normally presented in two parts. The first part says that when the sum ∞ i=1 P[A i ] converges, then the event A i i.o. has probability zero: The second part says that when ∞ i=1 P[A i ] diverges, and when the A i are mutually independent, then the event A i i.o. has probability one: In [7], Kochen and Stone presented a result that generalises the Second Borel-Cantelli Lemma in two directions: (i) it gives a lower bound on P[A i i.o.] when the A i are not mutually independent; and (ii) it can be used to show that the assumption of mutual independence in the original lemma can be weakened to pairwise independence. We formulate this generalisation following Yan [16]: Erdős and Rényi [4] gave a result that is intermediate between the second Borel-Cantelli Lemma and the Kochen-Stone Theorem. Like the Kochen-Stone Theorem it implies that the assumption of mutual independence in the second Borel-Cantelli Lemma can be weakened to pairwise independence. Erdős and Rényi applied their theorem to the study of generalised Cantor expansions for real numbers. The aim of the present note is to investigate "quantitative" versions of each of these "qualitative" results. The methods we use come from the proof mining programme (see Kohlenbach [8])-where numerical information is obtained from (often nonconstructive) proofs via logical methods. For some noteworthy applications of these methods, see the work of Avigad and collaborators on Ergodic Theory [1] and Kohlenbach and collaborators on Fixed-Point Theory [9,10]. Terence Tao's programme of bridging "soft" and "hard" analysis [13] was an independent rediscovery of some of these ideas. The results as presented above are results of "soft analysis": they relate statements about convergence or divergence, without giving any numeric information about the corresponding rates of convergence or divergence. For instance, regarding the First Borel-Cantelli Lemma, it is natural to ask how the rate of convergence of the sequence of partial sums Similar questions arise regarding the other three results.
We provide here answers to these four questions. We will find in Section 2 that the answer is almost trivial for the First Borel-Cantelli Lemma, as it has a very direct (constructive) proof. It turns out that the sequence P[ i≥n A i ] ∞ n=1 converges with the same rate as the sequence . The answers are found to be less trivial in Section 2 for the Second Borel-Cantelli Lemma and in Section 3 for the Erdős-Rényi Theorem, but still, the quantitative versions of these follow the standard proofs of the qualitative versions quite closely. As we will see in Section 4, in the case of the Kochen-Stone Theorem the situation is more complicated. In Section 4.2, we prove that a direct (computable) rate of convergence does not exist: we can find a concrete sequence of events (with computable probabilities P[A i ]) such that the rate of convergence for the quantitative version of the theorem is not computable. To allow for this, we use Tao's notion of "rate of metastability" (see Section 1.1), a concept which is logically equivalent to convergence, but is computationally weaker. We give a quantitative version of the Kochen-Stone Theorem that provides a rate of metastability for inequality (1) as a computable function of the rate of divergence of the sequence of partial sums In Section 5 we also consider the optimality of the bounds we obtain. For the First and Second Borel-Cantelli Lemmas, we can argue that the bounds obtained are in some sense best possible. For the Erdős-Rényi Theorem we conjecture that the bounds we present are optimal, but do not have a proof yet. For the Kochen-Stone Theorem, however, the "metastable" reformulation makes it much less clear what the right notion of optimality should be, and we leave this to future work.
The work presented in this paper was motivated by our on-going work in the area of metric Diophantine approximation, more specifically, on quantitative analyses of generalisations of the Khintchine-Groshev Theorem on approximability of real numbers by rationals. To introduce these generalisations, let I nm denote the unit cube [0, 1] nm in R nm , and let ψ : N → R + be given. A point X ∈ I nm , viewed as an n × m matrix, is said to be ψ -approximable if there are infinitely many (p, q) ∈ Z m × Z n such that ∥qX + p∥ < ψ(∥q∥) (where ∥ · ∥ is the supremum norm). Let A n,m denote the set of ψ -approximable points X ∈ I nm . The generalised Khintchine-Groshev Theorems are 0-1 laws for the Lebesgue measure of the sets A n,m governed by assumptions on the divergence (and possibly monotonicity) of sequences defined in terms of ψ . The most general theorem of this form is given in Beresnevich and Velani [2], which improves on earlier work of Gallagher [6] dealing with the case n = 1. In both of these works, the proofs break into two parts: A proof of a 0-1 law and a proof that a certain set has positive measure, and hence measure 1 by the 0-1 law. The Borel-Cantelli Lemmas and their generalisations are important tools in some of these proofs. In particular, the Kochen-Stone Theorem is a key step 1 in Beresnevich and Velani's work [2]. A quantitative analysis of these tools seemed to us to be a worthwhile investigation in its own right.

Rate of convergence vs rate of metastability
As mentioned above, in our quantitative analysis of the Kochen-Stone Theorem we will make use of Terence Tao's notion of metastability [13]. As an example, consider the statement that a sequence of reals (x n ) ∞ n=1 is Cauchy convergent: A rate of Cauchy convergence (or just rate of convergence) for the sequence is a function ϕ : N → N + (where N + denotes the positive integers) such that: While the mere existence of ϕ in (4) is equivalent to (3), if one has an explicit ϕ for which (4) holds, one has a quantitative rather than merely qualitative understanding of the convergence of the sequence x n ∞ n=1 . However, for a given sequence that is known to be convergent, it may not be possible to provide an explicit rate of convergence: in many interesting cases, it can be shown that no computable function ϕ satisfying (4) exists. In such cases, it is often worth considering the equivalent "metastable" version of (3), namely: Clearly (3) directly implies (5). But (5) also implies (3). Assume (5) and suppose (3) does not hold for some ℓ, ie ∀k∃m, n > k |x m − x n | ≥ 1 2 ℓ and let f : N + → N + be any function which provides an upper bound for m and n for each given k, ie: Taking this f in (5) leads to a contradiction.
In cases where there is no computable rate of convergence (as will be the case with the Kochen-Stone Theorem), one can still attempt to find a computable rate of metastability instead. In the convergence example above, the rate of metastability would be a function ϕ : N × (N + → N + ) → N + such that: One should think of the function f : N + → N + as potentially producing longer and longer intervals [k, f (k)], and the rate of metastability ϕ as trying to find, for arbitrarily large ℓ, an interval in which the sequence is 1

Rate of divergence
We say that a function ω : In the sequel, the sequence x i will typically comprise the partial sums of a series of terms in the interval [0, 1], for which we have the following lemma.
Then, for all n, N we have: Since ω is a rate of divergence and each a i ≤ 1, we have that: is finite" is equivalent to the convergence of the sequence of partial sums In quantitative terms, that implies the existence of a rate of Cauchy convergence ψ(ℓ) for s k , ie (9) ∀ℓ, m, n > ψ(ℓ) |s m − s n | < 1 2 ℓ or, equivalently, the existence of a function ϕ(ℓ) such that: converges to 0 with the same rate, ie for all ℓ ≥ 0 and m > ϕ(ℓ): Proof By subadditivity we have for all ℓ > 0 and m > ϕ(ℓ).
The second Borel-Cantelli Lemma says that, under the extra assumption that the events are mutually independent, the probability of In our quantitative version of this lemma we will estimate, for each n, how fast the sequence converges to 1, given a rate of divergence for the sequence be an infinite sequence of events which are mutually independent. Assume that the sequence then, for all n and N : Fix n and N . Let us write A i for the complement of the event A i . The independence of the events implies: 8

Rob Arthan and Paulo Oliva
Taking the natural logarithm on both sides we have where inequality (16) follows from the fact that ln(1 + x) ≤ x, for all x ∈ (−1, ∞) and inequality (17) follows from Lemma 1.5. Hence and so:

Proving qualitative version from quantitative one
Journal of Logic & Analysis 13:6 (2021) On the Borel-Cantelli Lemmas

3 Quantitative Erdős-Rényi Theorem
In this section, we present a quantitative version of the Erdős-Rényi Theorem. Our proof follows that of Erdős and Rényi [4], but uses more modern notation: X, Y for random variables, E(X) for the expectation of X (or mean value in Erdős and Rényi's terminology), and σ(X) for the standard deviation of X .
A simplistic logical formalisation of equation (2), involving lim inf , would give a formula with quantifier prefix ∀∃∀∃. We can simplify this to a ∀∃ using the following lemmas.

Lemma 3.1 For any sequence of events
we have that for all n ≥ 1: Proof Let X i be the random variable given by the indicator function of the event A i .
Lemma 3.2 For any sequence of reals d n ≥ 1, the following are equivalent: which, by the definition of inf , is equivalent to: This is easily seen to be equivalent to ∀ℓ, n∃k ≥ n d k ≤ 1 + 2 −ℓ (for the right-to-left direction take i = max(m, n)).
In the quantitative version of the Erdős-Rényi Theorem, we will assume that we are given a rate of divergence for the sequence , and a function ϕ witnessing which, by Lemmas 3.1 and 3.2, is equivalent to the assumption: Define n 1 = ϕ(1, 1) and, for k > 1, n k = ϕ(k, max(n k−1 , k)). Then, for all n and ℓ Proof Let X i and Y n be as in the proof of Lemma 3.1. Assumption (25) gives us: By (27) and (28), we get: Let n 1 = ϕ(1, 1) and, for k > 1, n k = ϕ(k, max(n k−1 , k)). The above implies (taking ℓ = k and n = max(n k−1 , k)): The Chebyshev inequality tells us that: Taking λ = εE(Yn) σ(Yn) , we have that, for any given ε ∈ (0, 1): From (30) and (32) (taking n = n k and ε = 1/2), we find that: Let B k be the event Y n k ≤ E(Yn k ) 2 , so that (33) together with the formula for the partial sums of a geometric series implies: By the quantitative version of the First Borel-Cantelli Lemma (Theorem 2.1): Fix n and ℓ. We will show that

Rob Arthan and Paulo Oliva
for m = max(ω(2n), ℓ + 3). Let C m ℓ = m k=ℓ+3 B k . By (36), inequality (37) will follow if we can show that: So, let us assume that x ∈ C m ℓ . Since, x ∈ C m ℓ iff ∀k ∈ [ℓ + 3, m](x ∈ B k ), we have, taking k = m: By the definition of n k and the assumption (25), which implies ϕ(ℓ, n) ≥ n, we have n ω(2n) ≥ ω(2n), and by the definition of m we also have m ≥ ω(2n). Hence using, for the last inequality, that assumption (24) and the pigeon-hole principle imply that for at least one i ∈ [n, n m ] we have X i (x) = 1, ie x ∈ A i ⊆ nm j=n A j . Since x was an arbitrary element of C m ℓ , this gives us (38) and hence (37).
Remark To obtain a quantitative version of the Erdős-Rényi Theorem, we have had to make two choices about points that are left open in the qualitative proof of [4]. The first choice is essentially forced upon us by our decision to use 1 2 ℓ in the formulation of assumption (25). This means that where Erdős and Rényi take the n k to be any sequence such that ∞ k=1 σ 2 (Yn k ) E 2 (Yn k ) converges, we have had to choose a sequence such that the series is dominated by a geometric series (see formula (30)). The second choice is that in formula (33), we have taken ε to be 1 2 , where Erdős and Rényi leave it unspecified. As pointed out by one of the referees, both the statements and the proofs above (Lemma 3.2 and Theorem 3.3) could be made more general, but rather more complicated, by introducing a convergent series in place of 1 2 l and a constant ε ∈ (0, 1) in place of 1 2 as parameters.

Proving qualitative version from quantitative one
Let us show that the Erdős-Rényi Theorem (Theorem 1.4) follows directly from our quantitative version (Theorem 3.3). The above assumptions imply that there exists an ω : N + → N + such that and a function ϕ : N × N → N such that:

Quantitative Kochen-Stone Theorem
As with the quantitative version of the second Borel-Cantelli Lemma, we will also assume that we are given a rate of divergence for the sequence m i=n P[A i ] m∈N + . Our quantitative version will follow closely the very concise proof of the Kochen-Stone Theorem discovered by Yan [16].
After expressing P[A i i.o.] as a limit, the Kochen-Stone inequality (1) has the form lim n→∞ p n ≥ lim sup n→∞ q n . Much as in Lemma 3.2, we can be more economical with the quantifiers using the following lemma.

Lemma 4.1 For any sequence of events
, the following are equivalent: Proof By the definition of A i i.o., (44) is equivalent to: Let us first show that the above is equivalent to: (47) ∀m, ℓ∃n > m P and n 2 > m such that: Then, taking n = max(n 1 , n 2 ), by (46), (48) and (49) we get: Thus (46) implies (47). Now, suppose (47) holds but (46) does not. Then, for some m and ℓ, we have that: But by (47)  .
As we will see in Section 4.2, there can be no computable function of m and ℓ that bounds n in (54). Therefore, we will consider its metastable counterpart and will produce an explicit computable bound on n as a function of m, ℓ and the function g.
be an infinite sequence of events. Let ω : N + → N + be such that for all N : Then, for all m and ℓ and g : N + → N + such that g(i) > i, for all i, there exists an n > m such that: , m , and • for all j ∈ [n, g(n)]: Hence, we can obtain a bound g (2 ℓ+1 ) (ω(2 ℓ+2 m)) on n which is completely independent of the actual events A i , but only depends on the parameters ω, g, m and ℓ.
Before we embark on the proof of Theorem 4.3, we need three further lemmas.
Proof See Chung and Erdős [3] or Yan [16]. Lemma 4.5 Let a and b be such that 0 < a ≤ b. Assume 0 ≤ x < a, 0 ≤ y < b and 0 < ε. If b ≥ 4x/ε 2 then: it is enough to show that: But since a ≤ b, that follows from and ω : N + → N + be as in the statement of Theorem 4.3. Then for all m and ε > 0 and all j > max ω 2 Proof Let m and ε > 0 be fixed. Let 2 a n = n i=1 P[A i ] 2 and b n = n i,k=1 P[A i A k ]. The assumption in the statement of Theorem 4.3 says that √ a n diverges with rate ω .
By Lemma 4.4, b n ≥ a n . Hence, for all N : , m and let j > M . By assumption we have that: By Lemma 4.5, we have that: The result follows from (57) and (58).
, m and n r+1 = g(n r ). We claim that with n = n r for some r ≤ 2 ℓ+1 , the conclusion holds. Assume this is not the case. Then for each r there is a j r ∈ [n r , n r+1 ] such that: But by Lemma 4.6, with ε = 1 2 ℓ+1 and hence, combining (59) and (60) and subtracting 1 2 ℓ+1 , we have: Chaining together the inequalities given by (61) for r = 0 to 2 ℓ+1 , we have which is a contradiction.

Proving qualitative version from quantitative one
Let us argue that the qualitative version of the Kochen-Stone Theorem (Theorem 4.3) directly implies the original qualitative version. Theorem 4.3 implies that for all m, ℓ and g : N + → N + (with g(i) > i) there exists an n such that: But (see Section 1.1), the above is equivalent to: for all m, ℓ there exists an n such that:

Rob Arthan and Paulo Oliva
Then, this is equivalent to: For all m, ℓ there exists an n such that: Hence, for all m and ℓ there exists an n such that: Therefore, for all m and hence:

Necessity for use of metastability
We wish to show that there is no effective bound on the witness n in (54), so that the approach via metastability is necessary. To this end, we will need examples where the Kochen-Stone inequality (1) is actually an equality with P[A i i.o.] < 1. To do this we will use the following result of Yan which shows that the diagonal terms in the sums on the right-hand side of the inequality are negligible. Yan's sketch of the proof in [16] is very terse, so we give more detail here.
Proof Define sequences s n , t n , b n and c n as follows: . Hence, as 2t n ≤ s 2 n ≤ 2t n + s n , lim n→∞ s 2 n /2t n = 1. By inequality (1), s 2 n ≤ (1 + o(1))b n . Hence, as 2c n ≤ b n = 2c n + s n , lim n→∞ b n /2c n = 1. It follows that lim sup n→∞ s 2 n /b n = lim sup n→∞ t n /c n , which is what we wish to prove.
Let (q n ) ∞ n=1 be any non-decreasing sequence of elements of the open unit interval (0, 1), let q = lim n→∞ q n and let A i be the event that a uniformly random member of the unit interval [0, 1] lies in [0, Let us define u n , v n and w n as follows: Then, rearranging the terms in the sums on the right-hand side of equation (63), we find that, given equation (63), the inequality (1) is equivalent to q ≥ lim sup n→∞ w n . The following lemma implies that equality holds in the Kochen-Stone inequality for any sequence of events (A i ) ∞ i=1 constructed in this way.
Lemma 4.8 Let w n and q be as above. Then w n → q as n → ∞.
Proof We have: Let us write σ j i for j k=i (q − q k ). From the above, multiplying by q i−1 and summing for i from 2 to n, we have: Define the sequence r n by: We claim that r n → 0 as n → ∞ so that w n = u n /v n = (q − r n ) → q, which is what we have to prove. So given ε > 0, let ε 0 = q 1 4 ε and choose N such that for all n > N ,we have q − q n < ε 0 . Define C by: Then, for n > N r n = C v n + q 1 σ n N+1 + . . . + q N σ n N+1 + q N+1 σ n N+2 + . . . + q n−1 σ n n (n − 1)q 1 + (n − 2)q 2 + . . . + q n−1 (71) where in equation (71) we have expanded the denominator of the second fraction using the definition of v n and where the bounds (72) and (73) are obtained using the facts that 0 < q 1 ≤ q i < 1 and σ n N+m ≤ σ n N+1 ≤ (n − N)ε 0 . Hence we can choose M > N , such that for n > M , we have |r n − ε 2 | < ε 2 , giving r n < ε. Hence r n → 0 as n → ∞.
Recall that a Specker sequence (see Specker [11] or Toelstra and van Dalen [14]) is a computable, monotone increasing, bounded sequence of rationals whose limit is not a computable real number. 3 Proof Take A i to be the event that a uniformly random element of [0, 1] lies in the interval [0, q i ] where the q i ∈ (0, 1) form a Specker sequence with limit q (note that q < 1, since q is not a computable real). Also ∞ i=1 q i diverges with the rate of divergence given by the computable function ω(N) = ⌈ N q 1 ⌉. implying that, g(ℓ) = ϕ(0, ℓ) is a rate of convergence for the Specker sequence q n . But since q = lim n→∞ q n is not a computable real, it cannot be approximated by a sequence with a computable rate of convergence. It follows that g, and hence also ϕ, is not a computable function.

Optimality of the estimates
It is easy to argue that the numeric bounds in Theorem 2.1 are best possible. Indeed, let (A i ) ∞ i=1 be a sequence of mutually exclusive events. In this case, the first inequality in (11) is actually an equality and hence the given estimate is optimal.
The estimate given by Theorem 2.2 is also the best possible amongst estimates that do not depend on n. To see this, consider the probability space whose outcomes are functions α : N + → {1, . . . , k} representing an infinite sequence of throws of a fair k-sided die. Let A i be the event α(i) = k, so that P[A i ] = 1/k. Clearly ∞ i=1 P[A i ] diverges with (optimal) rate ω(N) = kN . We have that: This is a decreasing function of n, so the worst case for our estimate is when n = 1, but in that case we have