In this post I will sketch a proof Dirichlet Theorem’s in the following form:
Theorem 1(Dirichlet’s Theorem on Arithmetic Progression)
Let
ψ(x;q,a)=n≤xn≡amodq∑Λ(n).
Let N be a positive constant.
Then for some constant C(N)>0 depending on N, we have for any q such that q≤(logx)N we have
ψ(x;q,a)=ϕ(q)1x+O(xexp(−C(N)logx))
uniformly in q.
Warning: I really don’t understand what I am saying.
It is at least 50% likely that this post contains a major error,
and 90% likely that there will be multiple minor errors.
Please kindly point out any screw-ups of mine; thanks!
Throughout this post: s=σ+it and ρ=β+iγ, as always.
All O-estimates have absolute constants unless noted otherwise, and A≪B means A=O(B),
A≍B means A≪B≪A.
By abuse of notation, L will be short for either
logq(∣t∣+2) or
logq(∣T∣+2), depending on context.
1. Outline
Here are the main steps:
We introduce Dirichlet characterχ:N→C which will serves as a roots of unity filter,
extracting terms ≡a(modq).
We will see that this reduces the problem to estimating the function
ψ(x,χ)=∑n≤xχ(n)Λ(n).
Introduce the L-function L(s,χ), the generalization of ζ for arithmetic progressions.
Establish a functional equation in terms of ξ(χ,s), much like with ζ,
and use it to extend L(s,χ) to a meromorphic function in the entire complex plane.
We will use a variation on the Perron transformation in order to transform
this sum into an integral involving an L-function L(χ,s).
We truncate this integral to [c−iT,c+iT];
this introduces an error Etruncate that can be computed immediately,
though in this presentation we delay its computation until later.
We do a contour as in the proof of the Prime Number Theorem in order to
estimate the above integral in terms of the zeros of L(χ,s).
The main term emerges as a residue,
so we want to show that the integral Econtour along this integral goes to zero.
Moreover, we get some residues ∑ρρxρ related to the zeros of the L-function.
By using Hadamard’s Theorem on ξ(χ,s) which is entire,
we can write LL′(s,χ) in terms of its zeros. This has three consequences:
We can use the previous to get bounds on LL′(s,χ).
Using a 3-4-1 trick, this gives us information on the horizontal distribution of ρ;
the dreaded Siegel zeros appear here.
We can get an expression which lets us estimate the vertical distribution
of the zeros in the critical strip (specifically the number of zeros with γ∈[T−1,T+1]).
The first and third points let us compute Econtour.
The horizontal zero-free region gives us an estimate of ∑ρρxρ,
which along with Econtour and Etruncate gives us the value of ψ(x,χ).
We use Siegel’s Theorem to handle the potential Siegel zero that might arise.
Possibly helpful diagram:
Contour integral for zeta function.
The pink dots denote zeros; we think the nontrivial ones all lie on the
half-line by the Generalized Riemann Hypothesis but they could actually be anywhere in the green strip.
2. Dirichlet Characters
2.1. Definitions
Recall that a Dirichlet character χ modulo q is a completely
multiplicative function χ:N→C which is also periodic modulo q,
and vanishes for all n with gcd(n,q)>1.
The trivial character (denoted χ0) is defined by χ0(n)=1 when
gcd(n,q)=1 and χ0(n)=0 otherwise.
In particular, χ(1)=1 and thus each nonzero χ value is a ϕ(q)-th primitive root of unity;
there are also exactly ϕ(q) Dirichlet characters modulo q.
Observe that χ(−1)2=χ(1)=1, so χ(−1)=±1.
We shall call χeven if χ(1)=+1 and odd otherwise.
If q~∣q, then a character χ~ modulo q~induces a character χ modulo q in a natural way:
let χ=χ~ except at the points where gcd(n,q)>1 but gcd(n,q~)=1,
letting χ be zero at these points instead.
(In effect, we are throwing away information about χ~.) A character
χ not induced by any smaller character is called primitive.
2.2. Orthogonality
The key fact about Dirichlet characters which will enable us to prove the theorem is the following trick:
Theorem 2(Orthogonality of Dirichlet Characters)
We have
χmodq∑χ(a)χ(b)={ϕ(q)0 if a≡b(modq),gcd(a,q)=1otherwise.
(Here χ is the conjugate of χ, which is essentially a multiplicative inverse.)
This is in some senses a slightly fancier form of the old roots of unity
filter.
Specifically, it is not too hard to show that ∑χχ(n) vanishes
for n≡1(modq) while it is equal to ϕ(q) for n≡1(modq).
2.3. Dirichlet L-Functions
Now we can define the associated L-function by
L(χ,s)=n≥1∑χ(n)n−s=p∏(1−χ(p)p−s)−1.
The properties of these L-functions are that
Theorem 3. Let χ be a Dirichlet character modulo q. Then
If χ=χ0, L(χ,s) can be extended to a holomorphic function on σ>0.
If χ=χ0, L(χ,s) can be extended to a meromorphic function on σ>0,
with a single simple pole at s=1 of residue ϕ(q)/q.
The proof is pretty much the same as for zeta.
Observe that if q=1, then L(χ,s)=ζ(s).
2.4. The Functional Equation for Dirichlet L-Functions
While I won’t prove it here, one can show the following analog of the functional
equation for Dirichlet L-functions.
Theorem 4(The Functional Equation of Dirichlet L-Functions)
Assume that χ is a character modulo q, possibly trivial or imprimitive.
Let a=0 if χ is even and a=1 if χ is odd. Let
ξ(s,χ)=q21(s+a)γ(s,χ)L(s,χ)[s(1−s)]δ(x)
where
γ(s,χ)=π−21(s+a)Γ(2s+a)
and δ(χ)=1 if χ=χ0 and zero otherwise. Then
ξ is entire.
If χ is primitive, then ξ(s,χ)=W(χ)ξ(1−s,χ) for some complex number ∣W(χ)∣=1.
Unlike the ζ case, the W(χ) is nastier to describe;
computing it involves some Gauss sums that would be too involved for this post.
However, I should point out that it is the Gauss sum here that requires χ to be primitive.
As before, ξ gives us an meromorphic continuation of L(χ,s) in the entire complex plane.
We obtain trivial zeros of L(χ,s) as follows:
For χ even, we get zeros at −2, −4, −6 and so on.
For χ=χ0 even, we get zeros at 0, −2, −4,
−6 and so on (since the pole of Γ(21s) at s=0 is no longer canceled).
To do this we have to estimate the sum ∑n≤xχ(n)Λ(n).
3.2. Introducing the Logarithmic Derivative of the L-Function
First, we realize χ(n)Λ(n) as the coefficients of a Dirichlet series.
Recall last time we saw that −ζζ′ gave Λ as coefficients.
We can do the same thing with L-functions: put
logL(s,χ)=−p∑log(1−χ(p)p−s).
Taking the derivative, we obtain
Theorem 5. For any χ (possibly trivial or imprimitive) we have
−LL′(s,χ)=n≥1∑Λ(n)χ(n)n−s.
Now, we unveil the trick at the heart of the proof of Perron’s Formula in the last post.
I will give a more precise statement this time, by stating where this integral comes from:
Lemma 6(Truncated Version of Perron Lemma)
For any c,y,T>0 define
I(y,T)=2πi1∫c−iTc+iTsysds
Then I(y,T)=δ(y)+E(y,T) where δ(y) is the indicator function defined by
δ(y)=⎩⎨⎧02110<y<1y=1y>1
and the error term E(y,T) is given by
∣E(y,T)∣<{ycmin{1,T∣logy∣1}cT−1y=1y=1.
In particular, I(y,∞)=δ(y).
In effect, the integral from c−iT to c+iT is intended to mimic an indicator function.
We can use it to extract the terms of the Dirichlet series of
−LL′(s,χ) which happen to have n≤x, by simply appealing to δ(x/n).
Unfortunately, we cannot take T=∞ because later on this would introduce
a sum which is not absolutely convergent,
meaning we will have to live with the error term introduced by picking a particular finite value of T.
3.4. Applying the Truncation
Let’s do so: define
ψ(x;χ)=n≥1∑δ(x/n)Λ(n)χ(n)
which is almost the same as ∑n≤xΛ(n)χ(n),
except that if x is actually an integer then Λ(x)χ(x) should be
halved (since δ(21)=21).
Now, we can substitute in our integral representation, and obtain
where
Etruncate=n≥1∑Λ(n)χ(n)⋅E(x/n,T).
Estimating this is quite ugly, so we defer it to later.
4. Applying the Residue Theorem
4.1. Primitive Characters
Exactly like before, we are going to use a contour to estimate the value of
∫c−iTc+iT−LL′(s,χ)sxsds.
Let U be a large half-integer (so no zeros of L(χ,s) with
Res=U).
We then re-route the integration path along the contour integral
c−iT→−U−iT→−U+iT→c+iT.
During this process we pick up residues, which are the interesting terms.
First, assume that χ is primitive,
so the functional equation applies and we get the information we want about zeros.
If χ=χ0, then so we pick up a residue of +x corresponding to
(−1)⋅−x1/1=+x.
This is the “main term”. Per laziness, δ(χ)x it is.
Depending on whether χ is odd or even, we detect the trivial zeros, which we can express succinctly by
m≥1∑2m−axa−2m
Actually, I really ought to truncate this at U,
but since I’m going to let U→∞ in a moment I really don’t
want to take the time to do so; the difference is negligible.
We obtain a residue of −LL′(s,χ) at s=0, which we denote b(χ), for s=0.
Observe that if χ is even,
this is the constant term of −LL′(s,χ) near s=0 (but there
is a pole of the whole function at s=0);
otherwise it equals the value of −LL′(0,χ) straight-out.
If χ=χ0 is even then L(s,χ) itself has a zero, so we are in worse shape. We recall that
LL′(s,χ)=s1+b(χ)+…
and notice that
sxs=s1+logx+…
so we pick up an extra residue of −logx. So, call this a bonus of −(1−a)logx
Finally, the hard-to-understand zeros in the strip 0<σ<1.
If ρ=β+iγ is a zero, then it contributes a residue of −ρxρ.
We only pick up the zeros with ∣γ∣<T in our rectangle, so we get a term
at least for primitive characters.
Note that the sum over the zeros is not absolutely convergent without the
restriction to ∣γ∣<T (with it, the sum becomes a finite one).
4.2. Transition to nonprimitive characters
The next step is to notice that if χ modulo q happens to be not primitive,
and is induced by χ~ with modulus q~,
then actually ψ(x,χ) and ψ(x,χ~) are not so different.
Specifically, they differ by at most
and so our above formula in fact holds for any character χ,
if we are willing to add an error term of (logq)(logx).
This works even if χ is trivial, and also q~≤q,
so we will just simplify notation by omitting the tilde’s.
Anyways (logq)(logx) is piddling compared to all the other error terms in the problem,
and we can swallow a lot of the boring residues into a new term, say
Etiny≤(logq+1)(logx)+2.
Thus we have
Unfortunately, the constant b(χ) depends on χ and cannot be absorbed.
We will also estimate Econtour in the error term party.
5. Distribution of Zeros
In order to estimate
ρ,∣γ∣<T∑ρxρ
we will need information on both the vertical and horizontal distribution of the zeros.
Also, it turns out this will help us compute Econtour.
5.1. Applying Hadamard’s Theorem
Let χ be primitive modulo q. As we saw,
ξ(s,χ)=(q/π)21s+21aΓ(2s+a)L(s,χ)(s(1−s))δ(χ)
is entire. It also is easily seen to have order 1,
since no term grows much more than exponentially in s (using Stirling to handle the Γ factor).
Thus by Hadamard, we may put
ξ(s,χ)=eA(χ)+B(χ)zρ∏(1−ρz)eρz.
Taking a logarithmic derivative and cleaning up, we derive the following lemma.
Lemma 7(Hadamard Expansion of Logarithmic Derivative)
For any primitive character χ (possibly trivial) we have
This will be useful in controlling things later.
The B(χ) is a constant that turns out to be surprisingly annoying;
it is tied to b(χ) from the contour, so we will need to deal with it.
5.2. A Bound on the Logarithmic Derivative
Frequently we will take the real part of this. Using Stirling, the short version of this is:
Lemma 8(Logarithmic Derivative Bound)
Let σ≥1 and χ be primitive (possibly trivial). Then
Proof: The claim is obvious for σ≥2,
since we can then bound the quantity by
ζ(σ)ζ′(σ)≤ζ(2)ζ′(2) due to the
fact that the series representation is valid in that range.
The second part with ∣t∣≥2 follows from the first line,
by noting that Res−11<1. So it just suffices to show that
O(L)−Reρ∑s−ρ1+Res−1δ(χ)
where 1≤σ≤2 and χ is primitive.
First, we claim that ReB(χ)=−Re∑ρ1. We use the following trick:
and the first two terms contribute logq and logt, respectively;
meanwhile the term sδ(χ) is at most 1, so it is absorbed. □
Short version: our functional equation lets us relate L(s,χ) to
L(1−s,χ) for σ≤0 (in fact it’s all we have!) so this gives the
following corresponding estimate:
Lemma 9(Far-Left Estimate of Log Derivative)
If σ≤−1 and t≥2 we have
L(s,χ)L′(s,χ)=O(logq∣s∣).
Proof: We have
L(1−s,χ)=W(χ)21−sπ−sqs−21cos21π(s−a)Γ(s)L(s,χ)
(the unsymmetric functional equation, which can be obtained from Legendre’s duplication formula).
Taking a logarithmic derivative yields
Because we assumed ∣t∣≥2,
the tangent function is bounded as s is sufficiently far from any of its poles along the real axis.
Also since Re(1−s)≥2 implies the LL′ term is bounded.
Finally, the logarithmic derivative of Γ contributes
log∣s∣ according to Stirling.
So, total error is O(logq)+O(1)+O(log∣s∣)+O(1) and this gives the conclusion.
□
5.3. Horizontal Distribution
I claim that:
Theorem 10(Horizontal Distribution Bound)
Let χ be a character, possibly trivial or imprimitive.
There exists an absolute constant c1 with the following properties:
If χ is complex, then no zeros are in the region σ≥1−Lc1.
If χ is real, there are no zeros in the region σ≥1−Lc1,
with at most one exception; this zero must be real and simple.
Such bad zeros are called Siegel zeros, and I will denote them βS.
The important part about this estimate is that it does not depend on χ but rather on q.
We need the relaxation to non-primitive characters, since we will use them in the proof of Landau’s Theorem.
Proof: First, assume χ is both primitive and nontrivial.
where we have dropped all but one term for the second line, and all terms for the third line.
If χ2 is not primitive but at least is not χ0,
then we can replace χ2 with the inducing χ~2 for a penalty of at most
just like earlier: Λ is usually zero, so we just look at the differing terms!
The Dirichlet series really are practically the same.
(Here we have also used the fact that σ>1, and p≥2.)
Consequently, we derive using 3−4−1 that
σ−13−s−ρ4+O(L)≥0.
Selecting s=σ+iγ so that s−ρ=σ−β, we thus obtain
σ−β4≤σ−13+O(L).
If we select σ=1+Lε, we get
1+Lε−β4≤O(L)
so
β<1−Lc2
for some constant c2, initially only for primitive χ.
But because the Euler product of the L-function of an imprimitive character
versus its primitive inducing character differ by a finite number of zeros on
the line σ=0 it follows that this holds for all nontrivial complex characters.
Unfortunately, if we are unlucky enough that χ~2 is trivial,
then replacing χ2 causes all hell to break loose.
(In particular, χ is real in this case!) The problem comes in that our new
penalty has an extra s−11, so
ReLL′(s,χ2)−Reζζ′(s)<s−11+logq
Applied with s=σ+2it, we get the weaker
σ−13−s−ρ4+O(L)+σ−1+2it1≥0.
If ∣t∣>logqδ for some δ then
the σ−1+2it1 term will be at most
δlogq=O(L) and we live to see another day.
In other words, we have unconditionally established a zero-free region of the form
σ>1−Lc(δ)and∣t∣>logqδ
for any δ>0.
Now let’s examine ∣t∣<logqδ.
We don’t have the facilities to prove that there are no bad zeros,
but let’s at least prove that the zero must be simple and real. By Hadamard at t=0, we have
−L(σ,χ)L′(σ,χ)<O(L)−ρ∑σ−ρ1
where we no longer need the real parts since χ is real,
and in particular the roots of L(s,χ) come in conjugate pairs.
The left-hand side can be stupidly bounded below by
The rest is arithmetic; basically one finds that there can be at most one Siegel zero.
In particular, since complex zeros come in conjugate pairs, that zero must be real.
It remains to handle the case that χ=χ0 is the constant function giving 1.
For this, we observe that the L-function in question is just ζ.
Thus, we can decrease the constant c2 to some c1 in such a way that the result holds true for ζ,
which completes the proof. □
5.4. Vertical Distribution
We have the following lemma:
Lemma 11(Sum of Zeros Lemma)
For all real t and primitive characters χ (possibly trivial), we have
ρ∑4+(t−γ)21=O(L).
Proof: We already have that
Re−LL′(s,χ)=O(L)−ρ∑Res−ρ1
and we take s=2+it, noting that the left-hand side is bounded by a
constant ζζ′(2)=−0.569961.
On the other hand,
Re2+it−ρ1=∣(2−β)+(t−γ)i∣2Re(2+it−ρ)=(2−β)2+(t−γ)22−β and
4+(t−γ)21≤(2−β)2+(t−γ)22−β≤1+(t−γ)22
as needed. □
From this we may deduce that
Lemma 12(Number of Zeros Nearby T)
For all real t and primitive characters χ (possibly trivial),
the number of zeros ρ with γ∈[t−1,t+1] is O(L).
In particular, we may perturb any given T by ≤2 so that the distance
between it and the nearest zero is at least c0L−1, for some absolute constant c0.
From this, using an argument principle we can actually also obtain the following:
For a real number T>0,
we have N(T,χ)=πTlog2πeqT+O(L)
is the number of zeros of L(s,χ) with imaginary part γ∈[−T,T].
However, we will not need this fact.
6. Error Term Party
Up to now, c has been arbitrary. Assume now x≥6; thus we can now follow the tradition
c=1+logx1<2
so c is just to the right of the critical line. This causes xc=ex.
We assume also for convenience that T≥2.
If 43x≤n≤45x, the log part is small, and this is bad.
We have to split into three cases: 43x≤n<x, n=x, and x<n≤45x.
This is necessary because in the event that Λ(x)=0 (x is a prime power),
then E(x/n,T)=E(1,T) needs to be handled differently.
We let xleft and xright be the nearest prime powers to x other than x itself.
Thus this breaks our region to conquer into
43x≤xleft<x<xright≤45x.
So we have possibly a center term (if x is a prime power, we have a term n=x),
plus the far left interval and the far right interval.
Let d=min{x−xleft,xright−x} for convenience.
In the easy case, if n=x we have a contribution of E(1,T)logx<Tclogx,
which is piddling (less than logx).
Suppose 43x≤n≤xleft−1.
If n=xleft−a for some integer 1≤a≤41x, then
(Recall ζ′/ζ had a simple pole at s=1, so near s=1 it behaves like s−11.)
The sum of everything is ≤T3.8x(logx)2+14xlogx+932logxmin{1,Tdx}. Hence, the grand total across all these terms is the horrible
Etruncate≤T5x(logx)2+3.6logxmin{1,Tdx}
provided x≥1.2⋅105.
6.2. Estimating the Contour Error
We now need to measure the error along the contour, taken from U→∞.
Throughout assume U≥3. Naturally, to estimate the integral, we seek good estimates on
LL′(σ).
For this we appeal to the Hadamard expansion. We break into a couple cases.
First, let’s look at the integral when −1≤σ≤2, so s=σ±iT with T large.
We bound the horizontal integral along these regions; by symmetry let’s consider just the top
∫−1+iTc+iT−LL′(s,χ)sxsds.
Thus we want an estimate of −LL′.
Lemma 13.
Let s be such that −1≤σ≤2, ∣t∣≥2.
Assume χ is primitive (possibly trivial),
and that t is not within c0L−1 of any zeros of L(s,χ). Then
L(s,χ)L′(s,χ)=O(L2).
Proof: Since we assumed that T≥2 we need not worry about
s−1δ(χ) and so we obtain
Now by the assumption that ∣γ−t∣≥cL−1;
so the terms of the sum are all at most O(L).
Also, there are O(L) zeros with imaginary part in that range.
Finally, we recall that L(2+it,χ)L′(2+it,χ) is bounded;
we can write it using its (convergent) Dirichlet series and then note it is at
most ζ(2+it)ζ′(2+it)≤ζ(2)ζ′(2). □
At this point, we perturb T as described in vertical distribution so that the lemma applies,
and use can then compute
Then at s=0 (eliminating the poles), we have
LL′(s,χ)=O(1)−ρ∑(ρ1+2−ρ1)
where the O(1) is LL′(2,χ)+2r+γ+ΓΓ′(1)
if a=0 and LL′(2,χ)−2r−ΓΓ′(21)+ΓΓ′(23) for a=1.
Furthermore,
which is O(logq) by our vertical distribution results, and similarly
ρ,∣γ∣<1∑2−ρ1=O(logq).
This completes the proof. □
Let β1 be a Siegel zero, if any; for all the other zeros,
we have that ρ1=β2+γ21. We now have two cases.
χ=χ.
Then χ is complex and thus has no exceptional zeros;
hence each of its zeros has β<1−logqc;
since ρ is a zero of χ if and only if 1−ρ is a zero of χ,
it follows that all zeros of χ are have ρ1<O(logq).
Moreover, in the range γ∈[−1,1] there are O(logq) zeros
(putting T=0 in our earlier lemma on vertical distribution).
Thus, total contribution of the sum is O((logq)2).
If χ=χ, then χ is real.
The above argument goes through, except that we may have an extra Siegel zero at βS;
hence there will also be a special zero at 1−βS. We pull these terms out separately.
Consequently,
b(χ)=O((logq)2)−βS1−1−βS1.
By adjusting the constant, we may assume βS>20152014 if it exists.
7. Computing ψ(x,χ) and ψ(x;q,a)
7.1. Summing the Error Terms
We now have, for any T≥2, x≥6, and χ modulo q possibly primitive or trivial, the equality
T=exp(c3logx)
for some constant c3, and moreover assume q≤T, then we obtain
ψ(x,χ)=δ(χ)x−βSxβS+O(xexp(−c4logx)).
7.3. Summing Up
We would like to sum over all characters χ.
However, we’re worried that there might be lots of Siegel zeros across characters.
A result of Landau tells us this is not the case:
Theorem 15(Landau)
If χ1 and χ2 are real nontrivial primitive characters modulo q1 and q2,
then for any zeros β1 and β2 we have
min{β1,β2}<1−logq1q2c5
for some fixed absolute c5.
In particular, for any fixed q, there is at most one χmodq with a Siegel zero.
Proof: The character χ1χ2 is not trivial, so we can put