Differentiating Convex Functions Constructively

In classical analysis, both convex functions and increasing functions [0, 1]→ R are differentiable almost everywhere. We will show that constructively, while we can prove this for convex functions, we cannot do so for increasing ones. In doing so we also show that Rademacher’s Theorem and the Alexandrov Theorem are not constructive. 2010 Mathematics Subject Classification 03F60 (primary)

Convex functions feature prominently in many areas of mathematics; convex optimisation alone has too many applications to list. A standard approach to optimising a convex function is gradient descent, an important algorithm which relies on the fact that the derivative of every convex function exists almost everywhere. We give a fully constructive proof of this result for functions on R in the setting of Bishop's constructive mathematics (BISH), which is mathematics with intuitionistic logic and (countable) dependent choice; see Bishop and Bridges [4] for a comprehensive introduction to BISH. In the tradition of Bishop we work informally, but we would also like to stress that there are many formal systems capturing the idea of BISH including set-theoretic and type-theoretic foundations (Aczel and Rathjen [1], Martin-Löf [9]), and these could be used to straightforwardly formalise our results. Proofs given in BISH have the advantage of being acceptable in classical mathematics (CLASS), Brouwer's intuitionism (INT), and Markov's Russian school of recursive mathematics (RUSS), Bridges and Richman [5]. Another advantage of proofs using only intuitionistic logic is that one can extract algorithms from them. Convex functions, and their applications to mathematical economics, have recently been investigated in BISH by Berger and Svindland [2,3].
We highlight a few important differences between studying the real numbers with classical logic and with intuitionistic logic (even though we define the reals in the standard way, as Cauchy sequences of rationals say). For real numbers x, y we say that x is apart from y, written x = y, if there exists n ∈ N such that |x − y| > 1/n. In BISH this is a stronger condition than ¬(x = y): the statement ∀x, y ∈ R : (¬x = y → x = y), known as Markov's principle, is independent of BISH. Another important proposition that is independent of BISH is the decidability of equality on the reals, known as the weak limited principle of omniscience: WLPO: For any real number x, either x = 0 or ¬x = 0.
WLPO is outright false in INT and RUSS, and many formal systems. As a substitute for the decidability of equality on R (and the stronger decidability of apartness, known as the limited principle of omniscience (LPO)) we have in BISH that (1) ∀x, y, z ∈ R : x < y → (x < z ∨ z < y) .
As a reminder to the reader: a function f : with x y and any t ∈ [0, 1] we have that and f is strictly convex, if this inequality is strict for x < y. Many of the familiar properties of convex functions can be established constructively with standard (classical) proofs. For example: • the class of convex functions is closed under addition and taking maximum; • if f , g : R → R are convex and g is non-decreasing, then g • f is convex; • any local minimum of a convex function is a global minimum; • the set of (global) minima is convex; • strictly convex functions have at most one minimum.
Our main result is a constructive proof of: The standard classical proof of this goes as follows. For a fixed x ∈ (0, 1) the function is increasing and is bounded above by (f (1)−f (x)) /(1−x) (both follow, for example, from Lemma 3 below). Thus lim h→x − F(h), the left limit f − (x) of f at x, exists; moreover f − is increasing. Similarly, the right limit f + of f exists and is increasing. Since f − , f + are increasing functions [0, 1] → R the sets S − , S + at which they are discontinuous are each countable. Then f is differentiable at each It can be shown constructively that an increasing, real-valued function on [0, 1] has at most countably many points of discontinuity (Theorem 5.4 of Diener and Hendtlass [8]). The problem with the above proof from a constructive stand-point comes before this: the assertion that lim h→x − F(h) exists when F is increasing and is bounded above. Indeed, it cannot be shown constructively that the left and right derivatives of a (strictly) convex function exist: In the proof of Proposition 1 below we construct the (potential) points of nondifferentiability of f directly.
We start by noticing that the approximate derivatives of f are increasing.
Proof Suppose that there exist x < y x < y with Then, by (1), either without loss of generality, we assume the former. 1 Let . This contradiction to the convexity of f proves the result. 2 We denote by 2 <N the set of finite binary sequences and by 2 N that of infinite binary sequences; for n ∈ N, 2 n denotes the set of binary sequences of length n. The length of a ∈ 2 <N is denoted by 3 |a| and the empty sequence by . For α ∈ 2 N and n ∈ N we write αn for the binary sequence of length n given by restricting α to {0, . . . , n − 1}. We write α a to mean that αn = a where n = |a|. For finite binary sequences a = (a 0 , . . . , a m−1 ), b = (b 0 , . . . , b n−1 ) of length m, n the concatenation a b of a and b is the finite binary sequence (a 0 , . . . , a m−1 , b 0 , . . . , b n−1 ) of length m + n.
Before a proposition in which we construct a finite sequence of points x 1 , . . . , x n that contains all "ε-jumps" in the derivative of f , we need a tiny lemma.
Proposition 5 Let f : [0, 1] → R be a convex function and let ε > 0. Then there exist x 1 , . . . , x n ∈ [0, 1] such that if y ∈ (ε, 1 − ε) is distinct from each of x 1 , . . . , x n , then there exists δ > 0 such that for all δ ∈ (0, δ): Proof We inductively construct • a function a → J a mapping finite binary sequences to subintervals of [ε, 1 − ε], • functions m r , m l : 2 <N → R, and • binary sequences α 1 , . . . , α n such that for all N ∈ N and all a, b ∈ 2 <N Suppose that we have completed the construction and for each i ∈ {1, . . . , n} let x i be the unique point in n∈N J αin ; such points exist by (2) and (5). Let y ∈ (ε, 1 − ε) with y = x i for each i and let N 1 be such that |y − x i | > (2/3) N for all i. By (1) there exists a ∈ 2 N such that y ∈ J • a and by (2) x i / ∈ J a for each i; it then follows from (5) that |{ i | α i a }| = 0. Finally it follows from (3), (4) and Lemma 3 that any δ > 0 such that (y − δ, y + δ) ⊂ J a will satisfy the conclusion of the proposition.
It now remains to detail the construction. To begin the induction we set ε and pick n ∈ N such that (n + 1) ε > m r ( ) − m l ( ); then (1), (2), and (5) hold trivially, while (3) holds by Lemma 3 and (4) holds by our choice of n. Now fix a ∈ 2 <N and suppose we have done the construction up to and including a.
We are now in a position to give our proof of Proposition 1.
Proof Let S 1 = {0, 1} and for each n 2 apply Proposition 5 with ε = 2 −n to construct x 1 , . . . , x n and let S n = {x 1 , . . . , x n }. Let (ξ n ) n 1 be an enumeration of { S n | n 1 } . Let y ∈ [0, 1] be such that y = ξ n for all n 1. Since y = 0 and y = 1 there exists N > 0 such that y ∈ (2 −N , 1 − 2 −N ). For all M > N , since y = x for each x ∈ S M , there exists δ > 0 such that: Since M > N is arbitrary, it follows from Lemma 3 that f is differentiable at y. : if (ξ n ) n 1 is a sequence in R and a < b then there exists z ∈ (a, b) such that z = ξ n for all n ∈ N.
Classically there is no hope to improve upon this result, since one can easily define a convex function [0, 1] → R that is not differentiable on a dense set. For example, if (q n ) n 1 is an enumeration of all rational points in [0, 1] then one can define an increasing function: Integrating this increasing function we get a convex one g(x) = x 0 f (t) dt, which is not differentiable at rational points. Constructively we cannot define f , since it is not continuous. That means that it could be the case that the assumption of strong continuity principles, such as the principle of continuous choice in INT, could prevent this sort of counterexample. However, we are still able to define g directly: It is easy to see that g is convex as it is the limit of functions, which are themselves convex as the finite sum of convex functions. Further g is not differentiable at each q i ∈ Q ∩ (0, 1). We write: Both g i , g i are continuous and g = g i + g i . For any 0 < δ < max{q i , 1 − q i } we have the first part being 0 since g i is convex and the second part being = 2 −i . Since δ can be arbitrarily small, g is not differentiable at q i .
We will, next, present a counterexample to a very similar looking problem as our main result. Theorem 5.4 of Diener and Hendtlass [8] shows that an increasing function is continuous at all but countably many points. In classical analysis it is also provable that an increasing function is differentiable almost everywhere. As the following result shows there is no hope in proving this constructively, since it implies the sequential version of WLPO, 4 which states that for a binary sequence (λ n ) n 1 we can decide ∀n ∈ N : λ n = 0 ∨ ¬∀n ∈ N : λ n = 0 .
It is easy to see that, in WLPO, we can assume that (λ n ) n 1 is increasing.
Our counterexample also functions as a (Brouwerian) counterexample to Rademacher's Theorem [11], which states that if U ⊂ R n is open and f : U → R m is Lipschitz continuous, then f is differentiable almost everywhere. Finally, the function F :  We have ∀x ∈ [0, 1] : |x − f n (x)| 1 2n and therefore f n → id uniformly. Now let (λ n ) n 1 be an increasing binary sequence and consider: id if λ n = 0 5 We should point out that measure theory is a constructively problematic topic, and has to be treated much more carefully than in the classical approach [4,Chapter 6]. However, we do assume that any sensible definition of the notion of a property holding almost everywhere on [0, 1] ought to imply that there is at least on point at which it does hold (the approach in [4, Chapter 6] does this).
Using the notation introduced in Diener and Hendtlass [7], g n is just the sequence λ (f n ), and is therefore Cauchy. 6 It therefore converges uniformly to a limit g. If there exists n such that λ n = 1, then g = f m for some m n and therefore g is differentiable at every point different from those of the form i 2m , and has a derivative of 0 and 2 at each such point. If λ n = 0 for all n ∈ N, then g = id and is therefore differentiable everywhere with a derivative of 1. Thus if g is differentiable at any point, then its derivative cannot be different from 0, 1, or 2. To be more precise the derivative either lies in ( 1 /3, 5 /3) or in (−∞, 2 /3) ∪ ( 4 /3, ∞). In the second case we must have ¬∀n ∈ N : λ n = 0 and in the first case we must have ¬∃n ∈ N : λ n = 1 , which is equivalent to ∀n ∈ N : λ n = 0 .
Notice that the notion of quasi-convexity, which is classically well studied, and features prominently in the aforementioned Berger and Svindland [2,3], is implied by being increasing. Thus it is not possible to replace "convexity" by "quasi-convexity" in Corollary 6. By replacing g with g + id in the proof of Proposition 7, we can actually improve that result to strictly increasing functions. Since every strictly increasing function is strictly quasi-convex, this shows that Corollary 6 can also not be proven, if we assume the function to be strictly quasi-convex.
We would like to finish this paper with the following thought. Viewing our two problems from a classical point of view, there is, of course, a difference between a convex and an increasing function. While the first is differentiable everywhere but at countably many points, the second is only differentiable almost everywhere. 7 Since the well-known Cantor function (also known as devil's staircase function) is continuous, and increasing but not differentiable on Cantor's middle third set, which is uncountable, one cannot prove that an increasing function is differentiable everywhere but at countably many points. So even classically there is an appreciable difference between the two problems.