Thomas P. Ogden

Sieving for Primes

A common task in number theory problems is to find all the primes up to a number $n$. Project Euler is a great resource for problems like this. A simple method of finding primes, the Sieve of Eratosthenes, was known in Ancient Greece. You start with 2 and discard as non-prime all multiples of 2 up to $n$. You then move onto the next number you haven’t discarded, 3, and mark as non-prime all multiples of 3 up to $n$. We’ve discarded 4, so we move on to 5 and continue. We can stop when we get to $\sqrt n$ as we’ll have discarded all the multiples by then.

Here’s what the algorithm looks like for $n = 400$.

When should we use the Sieve? In this notebook I’ll compare it to a naive iterative search by writing example algorithms in Python.

Our first approach is to iteratively build a list of primes up to a limit $n$.

When checking a number $i$ for primality, we only need to check prime factors, so we can check the list as we’re building it. Also, in checking if a number $i$ is prime, we only need to check for factors up to $\sqrt{i}$.

We’ll write a method that appends the next prime to an existing list of primes.

So for example, if we have the list [2, 3, 5] we expect to return [2, 3, 5, 7].

Now we repeat this until we get to $n$.

We’ll compare the methods later by counting the number of primes up to $n$ each returns.

Solution B: Sieve of Eratosthenes

For the Sieve of Eratosthenes algorithm, we’ll use some helper functions. First we want a method to sieve out the multiples of a factor $f$.

Note that we only need to sieve for factors greater than $f^2$, as the smaller multiples will already have been discarded.

Next a couple of simple helper methods. One to get us the next True value in a boolean list.

Another to return all the indexes of all remaining True values, shifted by 1. Once we’ve performed the sieve, this will represent the reminaing primes.

Now we’re ready to perform the sieve.

Again we’ll write a count method to compare.

Compare Counts

To check the result of the methods against each other, we’ll try a few values of $n$. The methods could both be wrong of course.

Complexity Analysis

Finally, we’ll compare timings by solving the problem over a range of $n$ values.

In figure 1 we compare the best-of-three timings on my laptop of the two methods over a range of $n$ values. We see that the iterative search is quicker for $n = 10$, but for $n = 20$ and beyond, the Eratosthenes method is quicker. Over this range, both are polynomial below quadratic.

In figure 2 we show the speedup of the Eratosthenes method over the iterative search. The speedup peaks at about 7$\times$ faster at $n = 10^4$. Beyond this Eratosthenes method is still significantly faster, but the speedup is coming down. This is likely due to the fact that the Sieve of Eratosthenes for high $n$ becomes memory intensive, as we need to store all of the numbers up to $n$, whereas with the more computationally intensive iterative search, we only need to store the primes we’ve already calculated.

The runtimes at $n = 10^7$ are in the order of minutes and already long enough that we’d want to look at further tuning of the algorithm beyond this.