vEnhance's avatar

Oct 28, 2016

🖉 A Sketchy Overview of Green-Tao

These are the notes of my last lecture in the 18.099 discrete analysis seminar. It is a very high-level overview of the Green-Tao theorem. It is a subset of this paper.

1. Synopsis

This post as in overview of the proof of:

Theorem 1 (Green-Tao)

The prime numbers contain arbitrarily long arithmetic progressions.

Here, Szemerédi’s theorem isn’t strong enough, because the primes have density approaching zero. Instead, one can instead try to prove the following “relativity” result.

Theorem (Relative Szemerédi)

Let SS be a sparse “pseudorandom” set of integers. Then subsets of AA with positive density in SS have arbitrarily long arithmetic progressions.

In order to do this, we have to accomplish the following.

  • Make precise the notion of “pseudorandom”.
  • Prove the Relative Szemerédi theorem, and then
  • Exhibit a “pseudorandom” set SS which subsumes the prime numbers.

This post will use the graph-theoretic approach to Szemerédi as in the exposition of David Conlon, Jacob Fox, and Yufei Zhao. In order to motivate the notion of pseudorandom, we return to the graph-theoretic approach of Roth’s theorem, i.e. the case k=3k=3 of Szemerédi’s theorem.

2. Defining the linear forms condition

2.1. Review of Roth theorem

Roth’s theorem can be phrased in two ways. The first is the “set-theoretic” formulation:

Theorem 2 (Roth, set version)

If AZ/NA \subseteq \mathbb Z/N is 3-AP-free, then A=o(N)|A| = o(N).

The second is a “weighted” version

Theorem 3 (Roth, weighted version)

Fix δ>0\delta > 0. Let f:Z/N[0,1]f : \mathbb Z/N \rightarrow [0,1] with Efδ\mathbf E f \ge \delta. Then Λ3(f,f,f)Ωδ(1).\Lambda_3(f,f,f) \ge \Omega_\delta(1).

We sketch the idea of a graph-theoretic proof of the first theorem. We construct a tripartite graph GAG_A on vertices XYZX \sqcup Y \sqcup Z, where X=Y=Z=Z/NX = Y = Z = \mathbb Z/N. Then one creates the edges

  • (x,y)(x,y) if 2x+yA2x+ y \in A,
  • (x,z)(x,z) if xzAx-z \in A, and
  • (y,z)(y,z) if y2zA-y-2z \in A.

This construction is selected so that arithmetic progressions in AA correspond to triangles in the graph GAG_A. As a result, if AA has no 3-AP’s (except trivial ones, where x+y+z=0x+y+z=0), the graph GAG_A has exactly one triangle for every edge. Then, we can use the theorem of Ruzsa-Szemerédi, which states that this graph GAG_A has o(n2)o(n^2) edges.

2.2. The measure ν\nu

Now for the generalized version, we start with the second version of Roth’s theorem. Instead of a set SS, we consider a function ν:Z/NR0\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0} which we call a majorizing measure. Since we are now dealing with AA of low density, we normalize ν\nu so that E[ν]=1+o(1).\mathbf E[\nu] = 1 + o(1). Our goal is to now show a result of the form:

Theorem (Relative Roth, informally, weighted version)

If 0fν0 \le f \le \nu, Efδ\mathbf E f \ge \delta, and ν\nu satisfies a “pseudorandom” condition, then Λ3(f,f,f)Ωδ(1)\Lambda_3(f,f,f) \ge \Omega_{\delta}(1).

The prototypical example of course is that if ASZNA \subset S \subset \mathbb Z_N, then we let ν(x)=NS1S(x)\nu(x) = \frac{N}{|S|} 1_S(x).

2.3. Pseudorandomness for k=3k=3

So, how can we put the pseudorandom condition? Initially, consider GSG_S the tripartite graph defined earlier, and let p=S/Np = |S| / N; since SS is sparse we expect pp small. The main idea that turns out to be correct is: The number of embeddings of K2,2,2K_{2,2,2} in SS is “as expected”, namely (1+o(1))p12N6(1+o(1)) p^{12} N^6. Here K2,2,2K_{2,2,2} is actually the 22-blow-up of a triangle. This condition thus gives us control over the distribution of triangles in the sparse graph GSG_S: knowing that we have approximately the correct count for K2,2,2K_{2,2,2} is enough to control distribution of triangles.

For technical reasons, in fact we want this to be true not only for K2,2,2K_{2,2,2} but all of its subgraphs HH.

Now, let’s move on to the weighted version. Let’s consider a tripartite graph GG, which we can think of as a collection of three functions

μz:X×YRμy:X×ZRμx:Y×ZR. \begin{aligned} \mu_{-z} &: X \times Y \rightarrow \mathbb R \\ \mu_{-y} &: X \times Z \rightarrow \mathbb R \\ \mu_{-x} &: Y \times Z \rightarrow \mathbb R. \end{aligned}

We think of μ\mu as normalized so that E[μx]=E[μy]=E[μz]=1\mathbf E[\mu_{-x}] = \mathbf E[\mu_{-y}] = \mathbf E[\mu_{-z}] = 1. Then we can define

Definition 4. A weighted tripartite graph μ=(μx,μy,μz)\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z}) satisfies the 33-linear forms condition if

Ex0,x1,y0,y1,z0,z1[μx(y0,z0)μx(y0,z1)μx(y1,z0)μx(y1,z1)μy(x0,z0)μy(x0,z1)μy(x1,z0)μy(x1,z1)μz(x0,y0)μz(x0,y1)μz(x1,y0)μz(x1,y1)]=1+o(1) \begin{aligned} \mathbf E_{x^0,x^1,y^0,y^1,z^0,z^1} \Big[ &\mu_{-x}(y^0,z^0) \mu_{-x}(y^0,z^1) \mu_{-x}(y^1,z^0) \mu_{-x}(y^1,z^1) \\ &\mu_{-y}(x^0,z^0) \mu_{-y}(x^0,z^1) \mu_{-y}(x^1,z^0) \mu_{-y}(x^1,z^1) \\ &\mu_{-z}(x^0,y^0) \mu_{-z}(x^0,y^1) \mu_{-z}(x^1,y^0) \mu_{-z}(x^1,y^1) \Big] \\ &= 1 + o(1) \end{aligned}

and similarly if any of the twelve factors are deleted.

Then the pseudorandomness condition is according to the graph we defined above:

Definition 5. A function ν:Z/NZ\nu : \mathbb Z / N \rightarrow \mathbb Z is satisfies the 33-linear forms condition if E[ν]=1+o(1)\mathbf E[\nu] = 1 + o(1), and the tripartite graph μ=(μx,μy,μz)\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z}) defined by

μz=ν(2x+y)μy=ν(xz)μx=ν(y2z) \begin{aligned} \mu_{-z} &= \nu(2x+y) \\ \mu_{-y} &= \nu(x-z) \\ \mu_{-x} &= \nu(-y-2z) \end{aligned}

satisfies the 33-linear forms condition.

Finally, the relative version of Roth’s theorem which we seek is:

Theorem 6 (Relative Roth)

Suppose ν:Z/NR0\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0} satisfies the 33-linear forms condition. Then for any f:Z/NR0f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0} bounded above by ν\nu and satisfying E[f]δ>0\mathbf E[f] \ge \delta > 0, we have Λ3(f,f,f)Ωδ(1).\Lambda_3(f,f,f) \ge \Omega_{\delta}(1).

2.4. Relative Szemerédi

We of course have:

Theorem 7 (Szemerédi)

Suppose k3k \ge 3, and f:Z/n[0,1]f : \mathbb Z/n \rightarrow [0,1] with E[f]δ\mathbf E[f] \ge \delta. Then Λk(f,,f)Ωδ(1).\Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

For k>3k > 3, rather than considering weighted tripartite graphs, we consider a (k1)(k-1)-uniform kk-partite hypergraph. For example, given ν\nu with E[ν]=1+o(1)\mathbf E[\nu] = 1 + o(1) and k=4k=4, we use the construction

μz(w,x,y)=ν(3w+2x+y)μy(w,x,z)=ν(2w+xz)μx(w,y,z)=ν(wy2z)μw(x,y,z)=ν(x2y3z). \begin{aligned} \mu_{-z}(w,x,y) &= \nu(3w+2x+y) \\ \mu_{-y}(w,x,z) &= \nu(2w+x-z) \\ \mu_{-x}(w,y,z) &= \nu(w-y-2z) \\ \mu_{-w}(x,y,z) &= \nu(-x-2y-3z). \end{aligned}

Thus 4-AP’s correspond to the simplex K4(3)K_4^{(3)} (i.e. a tetrahedron). We then consider the two-blow-up of the simplex, and require the same uniformity on subgraphs of HH.

Here is the compiled version:

Definition 8. A (k1)(k-1)-uniform kk-partite weighted hypergraph μ=(μi)i=1k\mu = (\mu_{-i})_{i=1}^k satisfies the kk-linear forms condition if

Ex10,x11,,xk0,xk1[j=1kω{0,1}[k]{j}μj(x1ω1,,xj1ωj1,xj+1ωj+1,,xkωk)nj,ω]=1+o(1) \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \mu_{-j}\left( x_1^{\omega_1}, \dots, x_{j-1}^{\omega_{j-1}}, x_{j+1}^{\omega_{j+1}}, \dots, x_k^{\omega_k} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents nj,w{0,1}n_{j,w} \in \{0,1\}.

Definition 9. A function ν:Z/NR0\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0} satisfies the kk-linear forms condition if E[ν]=1+o(1)\mathbf E[\nu] = 1 + o(1), and

Ex10,x11,,xk0,xk1[j=1kω{0,1}[k]{j}ν(i=1k(ji)xi(ωi))nj,ω]=1+o(1) \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \nu\left( \sum_{i=1}^k (j-i)x_i^{(\omega_i)} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents nj,w{0,1}n_{j,w} \in \{0,1\}.

This is just the previous condition with the natural μ\mu induced by ν\nu.

The natural generalization of relative Szemerédi is then:

Theorem 10 (Relative Szemerédi)

Suppose k3k \ge 3, and ν:Z/nR0\nu : \mathbb Z/n \rightarrow \mathbb R_{\ge 0} satisfies the kk-linear forms condition. Let f:Z/NtoR0f : \mathbb Z/N to \mathbb R_{\ge 0} with E[f]δ\mathbf E[f] \ge \delta, fνf \le \nu. Then Λk(f,,f)Ωδ(1).\Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

3. Outline of proof of Relative Szemerédi

The proof of Relative Szeremédi uses two key facts. First, one replaces ff with a bounded f~\widetilde f which is near it:

Theorem 11 (Dense model)

Let ε>0\varepsilon > 0. There exists ε>0\varepsilon' > 0 such that if:

  • ν:Z/NR0\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0} satisfies ν1rε\left\lVert \nu-1 \right\rVert^{\square}_r \le \varepsilon', and
  • f:Z/NR0f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}, fνf \le \nu

then there exists a function f~:Z/N[0,1]\widetilde f : \mathbb Z/N \rightarrow [0,1] such that ff~rε\left\lVert f - \widetilde f \right\rVert^{\square}_r \le \varepsilon.

Here we have a new norm, called the cut norm, defined by

fr=supAi(Z/N)r1Ex1,,xrf(x1++xr)1A1(x1)1Ar(xr). \left\lVert f \right\rVert^{\square}_r = \sup_{A_i \subseteq (\mathbb Z/N)^{r-1}} \left\lvert \mathbf E_{x_1, \dots, x_r} f(x_1 + \dots + x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

This is actually an extension of the cut norm defined on a rr-uniform rr-partite hypergraph (not (r1)(r-1)-uniform like before!): if g:X1××XrRg : X_1 \times \dots \times X_r \rightarrow \mathbb R is such a graph, we let

gr,r=supAiXig(x1,,xr)1A1(x1)1Ar(xr). \left\lVert g \right\rVert^{\square}_{r,r} = \sup_{A_i \subseteq X_{-i}} \left\lvert g(x_1, \dots, x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

Taking g(x1,,xr)=f(x1++xr)g(x_1, \dots, x_r) = f(x_1 + \dots + x_r), X1==Xr=Z/NX_1 = \dots = X_r = \mathbb Z/N gives the analogy.

For the second theorem, we define the norm

gk1,k=maxi=1,,k(gik1,k1). \left\lVert g \right\rVert^{\square}_{k-1,k} = \max_{i=1,\dots,k} \left( \left\lVert g_{-i} \right\rVert^{\square}_{k-1, k-1} \right).

Theorem 12 (Relative simplex counting lemma)

Let μ\mu, gg, g~\widetilde g be weighted (k1)(k-1)-uniform kk-partite weighted hypergraphs on X1XkX_1 \cup \dots \cup X_k. Assume that μ\mu satisfies the kk-linear forms condition, and 0giμi0 \le g_{-i} \le \mu_{-i} for all ii, 0g~10 \le \widetilde g \le 1. If gg~k1,k=o(1)\left\lVert g-\widetilde g \right\rVert^{\square}_{k-1,k} = o(1) then

Ex1,,xk[g(x1)g(xk)g~(x1)g~(xk)]=o(1). \mathbf E_{x_1, \dots, x_k} \left[ g(x_{-1}) \dots g(x_{-k}) - \widetilde g(x_{-1}) \dots \widetilde g(x_{-k}) \right] = o(1).

One then combines these two results to prove Szemerédi, as follows. Start with ff and ν\nu in the theorem. The kk-linear forms condition turns out to imply ν1k1=o(1)\left\lVert \nu-1 \right\rVert^{\square}_{k-1} = o(1). So we can find a nearby f~\widetilde f by the dense model theorem. Then, we induce ν\nu, gg, g~\widetilde g from μ\mu, ff, f~\widetilde f respectively. The counting lemma then reduce the bounding of Λk(f,,f)\Lambda_k(f, \dots, f) to the bounding of Λk(f~,,f~)\Lambda_k(\widetilde f, \dots, \widetilde f), which is Ωδ(1)\Omega_\delta(1) by the usual Szemerédi theorem.

4. Arithmetic progressions in primes

We now sketch how to obtain Green-Tao from Relative Szemerédi. As expected, we need to us the von Mangoldt function Λ\Lambda.

Unfortunately, Λ\Lambda is biased (e.g. “all decent primes are odd”). To get around this, we let w=w(N)w = w(N) tend to infinity slowly with NN, and define W=pwp.W = \prod_{p \le w} p. In the WW-trick we consider only primes 1(modW)1 \pmod W. The modified von Mangoldt function then is defined by

Λ~(n)={φ(W)Wlog(Wn+1)Wn+1 prime0else. \widetilde \Lambda(n) = \begin{cases} \frac{\varphi(W)}{W} \log (Wn+1) & Wn+1 \text{ prime} \\ 0 & \text{else}. \end{cases}

In accordance with Dirichlet, we have nNΛ~(n)=N+o(N)\sum_{n \le N} \widetilde \Lambda(n) = N + o(N).

So, we need to show now that

Proposition 13. Fix k3k \ge 3. We can find δ=δ(k)>0\delta = \delta(k) > 0 such that for N1N \gg 1 prime, we can find ν:Z/NR0\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0} which satisfies the kk-linear forms condition as well as ν(n)δΛ~(n)\nu(n) \ge \delta \widetilde \Lambda(n) for N/2n<NN/2 \le n < N.

In that case, we can let f(n)={δΛ~(n)N/2n<N0else.f(n) = \begin{cases} \delta \widetilde\Lambda(n) & N/2 \le n < N \\ 0 & \text{else}. \end{cases} Then 0fν0 \le f \le \nu. The presence of N/2n<NN/2 \le n < N allows us to avoid “wrap-around issues” that arise from using Z/N\mathbb Z/N instead of Z\mathbb Z. Relative Szemerédi then yields the result.

For completeness, we state the construction. Let χ:R[0,1]\chi : \mathbb R \rightarrow [0,1] be supported on [1,1][-1,1] with χ(0)=1\chi(0) = 1, and define a normalizing constant cχ=0χ(x)2dxc_\chi = \int_0^\infty \left\lvert \chi'(x) \right\rvert^2 dx. Inspired by Λ(n)=dnμ(d)log(n/d)\Lambda(n) = \sum_{d \mid n} \mu(d) \log(n/d), we define a truncated Λ\Lambda by Λχ,R(n)=logRdnμ(d)χ(logdlogR).\Lambda_{\chi, R}(n) = \log R \sum_{d \mid n} \mu(d) \chi\left( \frac{\log d}{\log R} \right). Let k3k \ge 3, R=Nk12k3R = N^{k^{-1} 2^{-k-3}}. Now, we define ν\nu by

ν(n)={φ(W)WΛχ,R(Wn+1)2cχlogRN/2n<N0else. \nu(n) = \begin{cases} \dfrac{\varphi(W)}{W} \dfrac{\Lambda_{\chi,R}(Wn+1)^2}{c_\chi \log R} & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

This turns out to work, provided ww grows sufficiently slowly in NN.