Computability of convergence rates in the Ergodic Theorem for Martin-L¨of random points

: In this paper we look at the convergence rates for the ergodic averages in the pointwise ergodic theorem for computable ergodic transformations on Cantor space. While, for example, these rates are layerwise computable for Martin-L ¨ of random points and effectively open sets with measure a computable real, they are also layerwise computable for an arbitrary interval. For the shift operator, however, there are effectively open sets for which there are no effective rates, in particular, not layerwise computable ones. We also show that, when the measure of the effectively open set is any real α , the convergence rates are computable in α and the layers relative to α . Computability Theorem Computability convergence rates Theorem points


Introduction
Probability laws that hold for almost all points can often be proven to hold for all Martin-Löf random points, a set of measure 1. For example, the Laws of Large Numbers, and the Law of the Iterated Logarithm, Vovk [17].
Further, under certain conditions one can use the randomness degree (the compressibility coefficient or layer) of a point to find the rate at which it satisfies the probability law, see Davie [5] and Hoyrup and Rojas [9].
In the first theme above, it was independently shown in the two papers Bienvenu, Day, Hoyrup, Mezhirov and Shen [3] and Franklin, Greenberg, Miller and Ng [6] that the ergodic average of every Martin-Löf random point will equal the measure of the corresponding set for all effectively open/closed sets for computable ergodic transformations in computable measure spaces. In the terminology of [6], every Martin-Löf random point is Birkhoff for this context.
In general, the convergence rates of the ergodic averages are, however, not computable.
While the results of Avigad, Gerhardy and Towsner [2] and those of Hoyrup and Rojas [9] imply that the convergence rates are layerwise computable when the measure of the effectively open set is a computable number, the computability of the measure of the set is not necessary. In fact, rates of convergence of the ergodic averages are layerwise computable for any interval.
We also construct, for the classical ergodic operator on Cantor space, the shift operator, an effectively open set for which convergence rates are not layerwise computable (in fact not even effective), and deduce from this a sufficient condition for layerwise computability of rates.
Lastly we show (using the result from [2] again) that for the measure of the effectively open set any real α, the convergence rates are computable in α and the layers relative to α.

Definitions and notation
Our ergodic setting is Cantor space with the normal product measure. We denote an infinite sequence by ω . Our ergodic transformations T will be computable unless noted otherwise. A function f on infinite binary sequences is computable if there is an algorithm that, given access to an infinite binary sequence ω , will, on input n, output the first n digits of f (ω). Of course, the algorithm uses only finitely many digits of ω for each such computation. We will assume that the test is nested, that is U n+1 ⊆ U n . Call the first k ∈ N for which ω / ∈ U k , the actual layer of ω . Any l > k, l ∈ N is a valid layer for ω .
Martin-Löf random points have many equivalent definitions, including the property that no betting strategy can succeed against the points when considered as infinite binary sequences and also that the points, when considered as such sequences, have incompressible initial segments.
There exist universal Martin-Löf tests which are such that, if a sequence passes it, then it passes all Martin-Löf tests. We will work with an optimal Martin-Löf test, which is an even stronger notion. A Martin-Löf test is optimal if, for any other Martin-Löf test V there exists a c ∈ N such that V c+n ⊂ U n for all n ∈ N .
Algorithmic randomness has been generalised to computable probability spaces, see Weihrauch [18] for the context of computable analysis, and Gàcs [1] and Hoyrup and Rojas [8,10] for the generalisation of Martin-Löf randomness to computable probability spaces. We will mostly stay in Cantor space in this paper.

Layerwise computable and decidable
Layerwise computability is a weakening of the standard notion of computability which has profound links with measure theory and topology. The seminal papers in this area are Hoyrup and Rojas [8,9]. For an excellent recent survey, see Hoyrup [11].
Definition 2.2 A function f defined on the random infinite binary sequences is layerwise computable if it is computable in the pair (ω, l ), where l is any valid layer for ω . 1 That is, we need a layer l as an extra input alongside ω . The function f gives the same output on (ω, l ) for l any valid layer. For example, if the actual layer of ω is 1, then the function is defined on each of (ω, k), k ≥ l with the same output.

Definition 2.3
A set is layerwise decidable if the membership is decidable given a pair (ω, l ).
No claim is made for behaviour on an input (ω, k) where either ω is not random, or k is not a valid layer for ω .
Note also that a layerwise computable function can be "made computable" on as large a measure set as we want. Hence if we want, for example, a layerwise function to act like a normal computable function on more than measure 1 − 2 −k points, we can just give all input points the same layer l for some effective l > k.
A typical example of layerwise computability is the following. Consider an enumeration of an effectively open set O. It is in general not decidable whether a given point will appear in the enumeration or not. When the measure of the set is a computable number however, the decision becomes layerwise decidable, see Davie [5].
This principle is very useful to us and we state it as a lemma: In the more general context of computable measure spaces, the counterpart of Lemma 2.4 is that a set is so-called effectively µ-measurable if and only if it is layerwise decidable. Another fundamental result in this context is that a function is layerwise computable if and only if it is so-called effectively measurable, Hoyrup and Rojas [8].

Hitting times
As an illustration of the concept of layerwise computability we look at the following theorem of Kučera [12], which played a central role in both papers Bienvenu et al [3], and Franklin et al [6]: A is an effectively closed set of Cantor space with measure greater than 0, then every Martin-Löf random sequence ω must have a tail in A.
In other words, under the shift operator, there will be a finite hitting time for (shifts of) ω to enter A.
Kučera in fact proved the converse too, that is, that having a tail in every such set A characterizes the Martin-Löf random points.
When the measure of the set A is a computable real we have the following layerwise computable version: Following Franklin et al [6], call a point ω which satisfies the above condition for a set S a Birkhoff point for S.
Kučera's theorem is used in both [3] and [6] to show that:

Convergence rates for ergodic systems
Let T be a computable ergodic transformation on Cantor space. To compute the convergence rate for a point ω and set S in the Ergodic Theorem is to have a computable function which, on inputs ω and ε, will output a number of iterations of the ergodic function T after which the ergodic average of the point ω will not vary more than ε from µ(S), the measure of the set.
For the convergence rates to be layerwise computable, we need not only the point ω as input, but also a valid layer of ω to find the number of iterations of T needed.

Layerwise computable for O with computable measure
We have the following two important theorems, the first from Hoyrup and Rojas [8]: We also have the very powerful result of Avigad, Gerhardy and Townser [2]: Theorem 4.5 Let X be a separable metric space and T be an ergodic measurepreserving transformation. Then, for any f in L 2 (X), the function n(ε) such that for every k ≥ n(ε) is computable in f and T .
If we set f n = A n , the ergodic average after n steps, then Theorem 4.5 says that f n converges effectively almost everywhere. Then by Theorem 4.4 the convergence rates for f n = A n are layerwise computable: This is not the case; in fact, convergence is layerwise computable for any interval. The next result also shows that the layerwise computability of the convergence rates for effectively open sets has less to do with the measure than the packing of the effectively open set.
Theorem 5.1 The convergence of the ergodic averages is layerwise computable (and hence effective) for any interval.
Proof We do the proof for the interval I having left endpoint 0. The general case is similar. Given n and ε = 1/2 k , we must find a stage i after which |A m (ω) − µ(I)| ≤ 1 2 k for all ω in layer n. Divide the unit interval into 2 k+2 dyadic intervals, each of length 1/2 k+2 . Since each of these intervals I k is computable, Proposition 4.6 allows us to find, for each I k , a stage after which, for all ω in layer n: Take the maximum off these stages. At this stage then the piece I ′ of I which is entirely contained in such intervals has: The remaining piece I ′′ of I is also contained in an interval of length 1/2 (k+2) . Hence on this piece, the average can be no less than 0 (clearly) and no more than:

George Davie
Since these pieces are disjoint we add the averages to get that: Ergodic averages for intervals are effective because of the neat "packing" of the set. It is clear that we can, in particular, transform any effectively open set O into an interval by just adding, for each interval enumerated in O, an interval of equal measure adjacent to our rightmost endpoint at that stage, to form in the limit the interval 0, µ(O) .
Since the "bad" intervals above (for which we are forced to use the entire interval length as approximation) are those intervals that are only partly contained in I, we see that we can approximate the averages effectively as long as there are not too many of these bad intervals: and let k n be the number of dyadic intervals of length 2 −n which only partially overlap the set J. Then the convergence rates are layerwise computable (and hence effective) if k n × 2 −n → 0 in an effective way.
Hence, if we can, given ε, find an n(ε) such that dividing into 2 n intervals and finding enough iterations of T to get the average on these intervals close, then the ergodic averages are layerwise computable and hence effective.
In the following section we will construct an example where the rates are far from computable.

Convergence is not effective for the shift operator
Layerwise computability (equivalently, uniform layerwise convergence) implies effective convergence (Hoyrup and Rojas [9]), so the next theorem shows in particular that there are no layerwise computable rates here. The proof uses ideas in the proof of the well known result of Krengel [13], that there are no general rates of convergence for the ergodic theorems. In particular, we use the idea of Rokhlin's Lemma [16] to "spread" the unit interval out into many small sets A, T −1 A, . . . , T −n A with large union. Proof Recall that the shift operator T , "shifts" an infinite binary sequence to the left; eg, T(010111..) = 10111 . . . This is well known to be a measure-preserving ergodic transformation.
Chaitin's halting probability, Ω (see Chaitin [4]), is the measure of an enumeration of halting programs in a dovetailing of the running of all programs on the empty input. A program p that halts (and is then enumerated) is seen as the interval 0.p and measure 1/2 |p| is then added to the total measure so far, which approaches Ω. Since the set of programs is prefix free, Ω < 1.
To ensure that µ(O) is bounded far below 1, we add a fixed initial segment of two zeroes to each halting program to ensure that the total measure of our effectively open set is less than 1/4. We will use this modified halting probability along with its associated effectively open set as a "measure provider". That is, we will not enumerate the modified interval 0.00p itself into our set O but will add many small intervals, adding to the same measure 1/2 |p|+2 .
(1) Dovetail the running of all programs. Let p be the ith program which halts in the dovetailing. (2) When p halts, we have measure 1/2 |p|+2 available to add new intervals to our set O, as follows. (3) Form a string of 0's as long as the number of steps l ran in the dovetailing until p halted. That is, form 0 l . (4) See this string of 0's as the interval adjacent to 0 in the unit interval. (By ergodicity of the shift operator, this interval will, in the limit, be hit on average its measure, 1/2 l , by almost all binary sequences.) (5) As the ith part of our set O we now enumerate the interval 0 l and a set of its inverse images under T . The first three intervals to be enumerated are then 0 l , 0(0 l ) and 1(0 l ).
We thus obtain one interval of measure 2 −l , two of measure 2 −l−1 , four of measure 2 −l−2 and so on. Call the nth set of intervals consisting of 2 n intervals of total measure 2 −l set I n . (6) Continue to enumerate these inverse images until we have exhausted the measure of 0.00p. If for example |p| = k − 2 then we have measure 1/2 k available and since each set I n has measure 2 −l we can enumerate in O at least 2 s of the sets I n where s × 2 −l = 2 −k , hence at least 2 l−k sets I n . Note that the set I n consists of binary strings w for which T n (w) ∈ 0 l ; that is, binary sequences which hit 0 l after n steps. (7) Make a note of when half of the measure is used up, that is, the I n we are at when this happens. Note that n ≥ 2 l−k−1 . Call the union of the I n 's enumerated after this point, that is, intervals from length l + n and up, set H , with µ(H) = 2 −k−1 . Now, for each of the strings w in H it is the case that T m (w) is in C for 0 ≤ m ≤ 2 l−k−1 . That is, while µ(O) < 1/4, the ergodic average for at least measure 2 −k−1 of sequences is 1 for 0 ≤ m ≤ 2 l−k−1 . Recall that l is the total runtime so far, which makes 2 l−k−1 a very late stage under T . Now, if there was an algorithm which, given k, could find bounds after which fewer than 2 −k−1 of sequences will have ergodic average 1, we would have the runtime l k for the longest running program p of length k − 2. Being able to do this for each k solves the Halting Problem. This gives us a contradiction.

Relativised randomness
In the example above we did not have effective convergence, since we were enumerating very long intervals of large total measure, very late. Having access to Ω would have helped of course, since then we could see how close we are to the total measure Ω at each stage of the enumeration. Theorem 4.5 states that access to Ω is enough for the convergence rates to be effective. We would like to prove a layerwise version of this.
To do this we must define relative layerwise computability. This will use the standard notion of relative randomness, with α seen as an oracle:  In other words, for any effectively open set, we do indeed have rates for the convergence of each element of a layer to the ergodic average. Only in this case, the layer is relative to the measure of the set we are considering. This allows some standard Martin-Löf random sequences to not have relatively layerwise convergence rates (those that are no longer random with respect to α). Note however, that Theorem 4.3 assures us that these points will be Birkhoff whatever the measure of the set.
So this also means that the only points which have a chance of not being Birkhoff points for these sets are the points which are not random with respect to the measure of the set. That is, for a point not to satisfy the convergence in Birkhoff's theorem it must be intimately related to the measure of the set.

2-random sequences
We have seen that we get layerwise computability in µ(O) for the convergence rates, by Theorem 7.5. We now show that for a set of measure 1, the rates are layerwise computable in Ω, whatever O is.
The set of 2-random binary sequences is the set of binary sequences which are Martin-Löf random with respect to α = Ω. 4 By Theorem 7.5, the rates for the 2-random sequences will be governed by the particular µ(O); but, since the measure of every effectively open set is computable in Ω, the rates will also be layerwise computable in Ω. Hence:

Acknowledgement
I would like to thank the referee for remarks and suggestions which improved the paper substantially. I would also like to thank the referees to a previous submission of this paper for helpful remarks.