## Tuesday, August 01, 2006

### Prime distribution--new approach?

The idea I presented in my previous post for looking at the twin primes probability might seem to easily apply with the prime distribution itself giving the odds that a natural number x is prime as

ProbPrime(x) = ((pj - 1)/pj)* ((pj-1 - 1)/pj-1)*...*1/2

where j is the number of primes less than sqrt(x), and pj is the jth prime.

But there's a problem with that approach as the naturals are NOT random. Let's see how well it does and then I'll explain the correct equation.

For instance, the primes up to sqrt(100) are 2, 3, 5, and 7 so I have

ProbPrime(100) = (6/7)*(4/5)*(2/3)*(1/2) = 0.2285 to four significant digits.

or 22.85%, which is nicely close to 1/(ln x), as 1/(ln 100) = 0.2171 to four significant digits.

So it is close but gets progressively off until it is about 12.2% larger than the correct count of primes, but why?

Consider that given a natural number x, roughly 1/3 of numbers up to and including x will have 3 as a factor, so x/3 is about that number, but exactly floor(x/3) will have 3 as a factor, where the floor() function just clips off the remainder, so for instance floor(10/3) = 3, exactly, and notice there are 3 numbers up to and including 10 that have 3 as a factor--3, 6 and 9.

So this idea can be looked at by finding exactly how many numbers have a particular prime as a factor, and subtracting that total from the total of numbers minus 1 as 1 is not prime to get the count of primes, and then dividing by the number.

For example, up to 10, you have

10 - floor(10/2) - floor(10/3) + floor(10/6) + 2 - 1 = 4 primes

which is explained as, subtract the evens from 10, and subtract the numbers divisible by 3, but then you are also subtracting some evens that have 3 as a factor so add back in those that have both as a factor, add 2 for the primes 2 and 3 being subtracted and subtract 1, since 1 is not prime. Divide 4 by 10 and you have 25% probability of primeness.

That way to count primes is called Legendre's Formula and it's not complicated, but what does that have to do with my probability idea?

Well, looking at 2/3 as the odds a number does NOT have 3 as a factor and 1/2 as the odds it does not have 2 as a factor, I have the equivalent of

10 - 10/2 - 10/3 + 10/6 = 10(1 - 1/2 - 1/3 + 1/6) = 10*(1/3)

which shows that this idea is really just like going through the same process as above, but without using floor() so there are remainders floating around, and without correcting for 2 and 3 being prime, as well as 1 not being prime.

But why does that work and not the other approach? Simple answer is, the naturals are NOT random! The list of counting numbers is very ordered as it is, of course, just 1, 2, 3, 4, 5 and so on out to infinity, where you advance by 1. Very ordered, not random, so an equation for random is not correct as rather than say about 2/3 of the numbers up to x have 3 as a factor you can find that exactly x - floor(x/3) do.

So the key difference is use of floor() above versus not using it, and that offers a way of writing out the Legendre's approach that is similar to what we've seen before!

x*(1 - floor(/p)) = x - floor(x/p)

where the numerator is left off to show another step is needed. And you have the rule that

floor(/p1)*floor(/p2) = floor(/p1p2)

which I will immediately show.

So that with 2 and 3 you have:

x*(1 - floor(/2)*(1 - floor(/3) = x - floor(x/2) - floor(x/3) + floor(x/6)

showing the rule that floor(/2)*floor(/3) = floor(/6)).

Notice then correct prime probability using exact numbers is

probPrime(x) = (x*(1 - floor(/pj))* (1 - floor(/pj-1)*...*(1 - floor(/2)) + pi(sqrt(x) - 1)/x

where again pj is the largest prime up to and including sqrt(x), and here I've also used the traditional prime counting function notation versus using my own with my prime counting function, adding back in the primes subtracted out as explained above, and subtracting for 1 as it's not prime as explained above, and then dividing the full count by x to get the ratio.

Notice that is very close to what we had before, except the floor functions clip off certain terms, to give exact values which can be determined because the list of naturals is NOT random, but well-ordered.

In contrast, with the twin primes probability as shown in my previous post, there is no similar way to getting the exact count, as the primes are random with respect to each other.

So there is a simple way to explain how the probability of primeness differs from the random by using the floor() function.

James Harris