Postponed Prime Sieve in Swift

Question 1

Motivated by this question, I looked out for other "infinite prime generators", i.e. functions which produce the list of prime numbers in increasing order and do not have an a-priori upper limit (such as the standard implementation of the Sieve of Eratosthenes). Repeated calling should produce "all" prime numbers (restricted only by time, available memory, and the limited range of the integer data type).

I found this answer (which was re-written for Python 3 later) to this SO question and have implemented that "postponed sieve" algorithm in Swift (Swift 3, Xcode 8 beta 3 required):

public final class PostponedPrimeIterator: IteratorProtocol {
 private var basePrimes: PostponedPrimeIterator!
 private var basePrime = 0
 private var basePrimeSquared = 0
 private var sieve: [Int: Int] = [:]
 private var initialPrimes = [2, 3, 5, 7]
 private var c = 9 // First candidate after fixed list
 public func next() -> Int? {
 if !initialPrimes.isEmpty {
 return initialPrimes.removeFirst()
 }
 if c == 9 {
 // Create the base prime generator and fetch the first odd prime:
 basePrimes = PostponedPrimeIterator()
 _ = basePrimes.next()
 basePrime = basePrimes.next()!
 basePrimeSquared = basePrime * basePrime
 assert(c == basePrimeSquared)
 }
 while true {
 defer { c += 2 }
 let factor: Int
 if let f = sieve.removeValue(forKey: c) {
 // `c` is divisible by `f`
 factor = f
 } else if c == basePrimeSquared {
 // `c` is the square of `p`
 factor = basePrime
 basePrime = basePrimes.next()!
 basePrimeSquared = basePrime * basePrime
 } else {
 // `c` is a prime number
 assert(c < basePrimeSquared)
 return c
 }
 // Insert next odd number which is divisiby by `factor` but not present in the sieve:
 var j = c + 2 * factor
 while sieve[j] != nil {
 j += 2 * factor
 }
 sieve[j] = factor
 }
 }
}
public struct PostponedPrimeSequence: Sequence {
 public func makeIterator() -> PostponedPrimeIterator {
 return PostponedPrimeIterator()
 }
}

The interesting thing about that algorithm is that the prime generator needs a list of "base primes" which are created using another instance of itself. This works because

the base prime generator is created "delayed", after producing some fixed primes, and
in order to generate primes up to \$ N \,ドル base primes are needed only up to \$ \sqrt N \,ドル

so that the creation of nested prime generators terminates eventually. (The number of nested prime generators is \$O(\log \log N) \,ドル thanks to Will Ness for pointing that out.)

The sieving is done using a dictionary which holds the "next composite numbers" each of which is a multiple of some prime among the base primes (i.e. such that are not bigger than the square root of the current candidate).

As explained in the above-mentioned answer, the required memory to produce \$ n \$ primes is \$ O(\sqrt n) \$.

Example usage (first 20 primes):

let p20 = Array(PostponedPrimeSequence().prefix(20))
print(p20) // [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

Benchmarking (measure the time to compute the 1,000,000th prime number, on a 1,2 GHz Intel Core m5 MacBook):

let TERMS = 1_000_000
let startTime = Date()
var gen = PostponedPrimeIterator()
for _ in 1 ..< TERMS { _ = gen.next() }
let p = gen.next()!
let endTime = Date()
let time = endTime.timeIntervalSince(startTime)
print(p, String(format: "%.02f", endTime.timeIntervalSince(startTime)), "sec")
// 15485863 1.26 sec

All kinds of feedback is welcome, including, but of course not limit to:

Better type/property/variable names? I chose PostponedPrimeIterator because Generator was renamed to Iterator in Swift 3.
As far as I have seen, all iterators in the Swift standard library are structs, i.e. value types, and Apple recommends to work with value types whenever possible. I needed to make it a class because of the self-referencing property
```
private var basePrimes: PostponedPrimeIterator!
```
Is there a better solution?
Swift does not have a yield statement as in Python, which makes some kind of state machine necessary: The first primes are served from an array, then the base prime generator is created, and from then on the primes are produced by the sieving algorithm. Is there a more elegant way to manage these different states? Can making basePrimes an implicitly unwrapped optional be avoided?
defer { c += 2 } is called in the main loop to increment the variable even if the loop was exited via return c. Would you consider that a reasonable use? (This seems to consider it "harmful".)

Question 2

Related. Will Ness later brought his code here for (Python) optimization.

Question 3

As this was the next logical step from my code, I will definitely have some things to say. I've just got to do a bit of digging through WWDC Swift3 talks to source my claims ;D

Question 4

@CAD97: Thanks for the link, I hadn't seen that. Python is not my first language so it will take me a while to digest that :) – Your feedback is most welcome!

Question 5

I ended up remaking this in Kotlin, if you're interested. (Who knew that generating primes is a great way to take a language through the paces! :P)

Question 6

@CAD97: Unfortunately I don't know anything about Kotlin :) But I must admit that I am a bit proud of my explanation of the algorithm. Perhaps it is trivial for the people working in that field, but I found some answers on SO and CR quite terse. Thanks for quoting it with kind words!

Question 7

Let me prefix this answer by a couple things:

I only about 90% understand the algorithm in full. I know enough to tinker with it without breaking it, but I couldn't explain it to someone with pen and paper.
As Swift 3 is still beta, some of the style guides are still up in the air. For the most part, I've tried to source purely stylistic comments from Apple's examples and documentation, but those might change and the community might end up with different ideas than Apple preached.

Now, addressing your points:

Better type/property/variable names?

I fully agree on PostponedPrimeIterator. It agrees with prexisting types, such as ClosedRangeIterator, DictionaryIterator, EnumeratedIterator, etc. The only variable I would potentially rename is private var c, to a more descriptive name of candidate (as your comment clarifies).

As far as I have seen, all iterators in the Swift standard library are structs.

As far as the reference page goes, it seems that the only classes in the standard library are Managed or non-@objc for the type system.

Is there a [way to make PostponedPrimeIterator a struct]?

I won't comment on whether this Iterator would be better as a struct or class: that's a bit philosophical and comes down to preference and your own benchmarks. However, if you do want to make it a struct, it is possible:

The reason that you cannot just make this a struct is that the compiler spits out the error error: value type 'Type' cannot have a stored property that references itself. The solution is to not directly recurse types, but to store a type-erased iterator.

The changes to use AnyIterator and structify PostponedPrimeIterator are these lines:

private struct PostponedPrimeIterator: IteratorProtocol {
 private var basePrimes: AnyIterator<Int>!
...
public mutating func next() -> Int? {
 ...
 if c == 9 {
 basePrimes = AnyIterator(PostponedPrimeIterator())
 ...
 }
 ...
}

In this manner you can get value semantics for your iterator. I made a micro benchmark to test the cost of wrapping a simple Array's IndexingIterator in an AnyIterator, which came out to about 0.006s per 10,000 items on my machine, so the cost is small but not negligible.

Again, I leave it up to you to decide if this is worth it: to me it feels like we are working against the compiler. If I am not mistaken, the reason for the compiler prohibiting simple property type recursion is that it can often lead to infinite space as each copy of the struct stores its own copy of the struct which stores its own copy of the struct which stores its own <snip> which leads me to my next point:

Can making basePrimes an implicitly unwrapped optional be avoided?

No, and the reason is as the above recursion tells us. If basePrimes is a non-optional type, then it has to be initialized at the initialization step. And since this is a copy of PostponedPrimeIterator in PostponedPrimeIterator, creating a value at initialization time would lead to infinite recursion and infinite space cost. Unfortunately as I cannot find the Apple reference on implicitly unwrapped optionals at this point, I will have to link a StackOverflow answer about them, which states:

Implicitly-unwrapped optionals are quite useful in cases where the value may not be present at initialization, but are set early and unlikely to become nil again.

This is the exact use-case that we have here, and as such I would say the implicitly unwrapped optional is the perfect tool to use.

defer { c += 2 }

This is a matter of style and your own tastes. What I find amusing is that this effectively is a return of C-style for loops, which were removed. For example, consider these two snippets of code:

Java:

int c = 3;
for ( /* no initialization */ ; /* no end case */ ; c += 2) {
 if ( isPrime(c) ) return c;
}

Swift:

var c = 3
while true {
 defer { c += 2 }
 if isPrime(c) {
 return c
 }
}

There is a slight difference, as in the Swift snippet the defer block runs after the return and it does not in the Java, but it would be trivial to use while { defer {} } to emulate a traditional C-style loop.

The point of showing that was to say that my first thought was a C-style for loop. This turns out to be correct, but I could have as easily intuited that the defer block only ran after a return and not just after the while block ended.

I feel the NSHipster article goes a little far in declaring that the defer {} return is in itself dangerous. I do agree, however, that when the defer's purpose is not immediately obvious from its context (such as when used for cleanup), it can be confusing. However, (and I'm sure an Apple example exists which I cannot find,) I think the use of defer right before a return is made clear in context. It is a bit different, but you can always see exactly when the code is going to run -- directly after the following return.

To draw an actual guideline from the above paragraph: use defer immediately following allocation for cleanup, or immediately before a return for code that needs to run after the return value is determined. As such, your loop changes in the following ways:

remove defer { c += 2 } from the first line
add defer { c += 2 } immediately before the only return
add c += 2 at the end of the loop

The key point here is the single exit point that the defer will be run on. The point is to keep defer located next to why it exists, which is in this case to run after the return.

I will reiterate that this particular guideline is mostly my opinion, and the opinion of the Swift community and Apple may at some point change (as in fact, mine will likely as well).

And now a few miscellaneous bits:

removeFirst and removeLast perform similarly. I performed a micro benchmark of removeFirst versus removeLast and removeLast was faster by about 1 millionth of a second on my machine. As such, you can use either. It might be negligibly faster to reverse basePrimes and use removeLast, however.

j at the bottom of your loop could have a descriptive name rather than a single letter. However, I do not have any suggestions, thus this being located here instead of at the top.

let time = endTime.timeIntervalSince(startTime) in your benchmarking code is an unused assignment. Additionally, it is usually the short name for iterators, rather than gen, as they are no longer generators.

These tweaks are unlikely to (noticeably) change execution time, but as always, if you are concerned about performance, test every change. The next step in optimization would be to add a wheel. But that's for another question.

My benchmarks were done on a Mac Mini (mid 2011) running macOS Sierra developer beta 3 with 8GB RAM, and as such my results are probably more dramatic than should be expected on modern machines.

Question 8

Some quick replies: Good point about AnyIterator, that should work but might decrease the performance due to the additional level of indirection. I will test it later. – Your suggestion about defer is exactly what I had in a previous version, before I decided to remove all duplicates of "c += 2", so I can agree with that. – removeFirst is a protocol extension method of Collection and has complexity O(1). – time was meant to be used in the print statement, somehow that got duplicated.

Question 9

(cont.) Thanks for the review, I will read it in more detail during the weekend.

Question 10

@MartinR removeFirst() is indeed in a protocol extension of Collection, but with the constraint that Self == SubSequence. Array has implemented its own removeFirst() which runs in O(n) instead of O(1).

Question 11

@TimVermeulen: You are right, I somehow arrived at the wrong definition. I did not make a measurable difference in my test code, that that's because the array is very short.

Question 12

@TimVermeulen: So you could define it as a slice: var initialPrimes = ArraySlice([2, 3, 5, 7]), then removeFirst() becomes O(1).

Question 13

@CAD97 already wrote an excellent review, I just have a couple things to add:

I made basePrimes a normal optional instead of an implicitly unwrapped optional. Instead of initialising it when c == 9, we can unwrap it using guard, and initialise it if it doesn't have a value.
I also made basePrime an optional because the default value of 0 was pretty arbitrary.
basePrimeSquared seemed more trouble than it's worth, so I got rid of it.
In order to reduce the number of force unwraps (basePrimes.next()!), I added a private nextPrime() function that returns a non-optional Int, and next() simply calls nextPrime().
I made initialPrimes an ArraySlice instead of an Array as you suggested in the comments, to make removeFirst() an O(1) operation instead of O(n).
Your while sieve[j] != nil loop can (partly) be replaced by sequence(start:next:), which allows us to get rid of var j.
I added some minor tweaks, like changing an if into a guard and renaming c to candidate.

Here's the result:

public final class PostponedPrimeIterator: IteratorProtocol {
 private var basePrimes: PostponedPrimeIterator?
 private var basePrime: Int?
 private var sieve: [Int: Int] = [:]
 private var initialPrimes: ArraySlice = [2, 3, 5, 7]
 private var candidate = 9
 public func next() -> Int? {
 return nextPrime()
 }
 private func nextPrime() -> Int {
 guard initialPrimes.isEmpty else {
 return initialPrimes.removeFirst()
 }
 guard let basePrimes = basePrimes else {
 // Create the base prime generator:
 let basePrimes = PostponedPrimeIterator()
 self.basePrimes = basePrimes
 // Fetch the first odd prime:
 _ = basePrimes.next()
 basePrime = basePrimes.next()
 return nextPrime()
 }
 while true {
 defer { candidate += 2 }
 let factor: Int
 if let f = sieve.removeValue(forKey: candidate) {
 // `candidate` is divisible by `f`
 factor = f
 } else if let basePrime = basePrime, candidate == basePrime * basePrime {
 // `candidate` is the square of `basePrime`
 factor = basePrime
 self.basePrime = basePrimes.nextPrime()
 } else {
 // `candidate` is a prime number
 return candidate
 }
 // Insert next odd number which is divisiby by `factor` but not present in the sieve:
 let nextOddMultiples = sequence(first: candidate + 2 * factor, next: { 0ドル + 2 * factor })
 for nextOddMultiple in nextOddMultiples where sieve[nextOddMultiple] == nil {
 sieve[nextOddMultiple] = factor
 break
 }
 }
 }
}

My benchmarks indicate no significant performance difference between this code and your original code, but I got rid of all the !s, so I think it's an improvement :)

I'm still not really happy about the while true loop, but I didn't manage to get rid of it without sacrificing performance.

Question 14

I am not a fan of "guard" as a general replacement for "if not", as in guard initialPrimes.isEmpty (but that is just my opinion). – basePrime has a value once the "main loop" is entered, so checking it with if let basePrime = basePrime seems misleading to me. – I think I tried a nextPrime() method with a next() wrapper and it was slower, but I am not sure anymore. – guard let basePrimes = basePrimes (and the recursive call of nextPrime()) is a good idea.

Question 15

The first guard is indeed a matter of opinion. I like to generally use guard if it always exits the scope, but if works just fine. I agree that if let basePrime = basePrime is misleading, but it can't be unwrapped at the same time as basePrimes because it might change in the while loop. I guess there's no perfect solution.

CAD97 CAD97 1,9441 gold badge13 silver badges34 bronze badges · Accepted Answer · 2016-07-29 20:43:34Z

Let me prefix this answer by a couple things:

I only about 90% understand the algorithm in full. I know enough to tinker with it without breaking it, but I couldn't explain it to someone with pen and paper.
As Swift 3 is still beta, some of the style guides are still up in the air. For the most part, I've tried to source purely stylistic comments from Apple's examples and documentation, but those might change and the community might end up with different ideas than Apple preached.

Now, addressing your points:

Better type/property/variable names?

I fully agree on PostponedPrimeIterator. It agrees with prexisting types, such as ClosedRangeIterator, DictionaryIterator, EnumeratedIterator, etc. The only variable I would potentially rename is private var c, to a more descriptive name of candidate (as your comment clarifies).

As far as I have seen, all iterators in the Swift standard library are structs.

As far as the reference page goes, it seems that the only classes in the standard library are Managed or non-@objc for the type system.

Is there a [way to make PostponedPrimeIterator a struct]?

I won't comment on whether this Iterator would be better as a struct or class: that's a bit philosophical and comes down to preference and your own benchmarks. However, if you do want to make it a struct, it is possible:

The reason that you cannot just make this a struct is that the compiler spits out the error error: value type 'Type' cannot have a stored property that references itself. The solution is to not directly recurse types, but to store a type-erased iterator.

The changes to use AnyIterator and structify PostponedPrimeIterator are these lines:

private struct PostponedPrimeIterator: IteratorProtocol {
 private var basePrimes: AnyIterator<Int>!
...
public mutating func next() -> Int? {
 ...
 if c == 9 {
 basePrimes = AnyIterator(PostponedPrimeIterator())
 ...
 }
 ...
}

In this manner you can get value semantics for your iterator. I made a micro benchmark to test the cost of wrapping a simple Array's IndexingIterator in an AnyIterator, which came out to about 0.006s per 10,000 items on my machine, so the cost is small but not negligible.

Again, I leave it up to you to decide if this is worth it: to me it feels like we are working against the compiler. If I am not mistaken, the reason for the compiler prohibiting simple property type recursion is that it can often lead to infinite space as each copy of the struct stores its own copy of the struct which stores its own copy of the struct which stores its own <snip> which leads me to my next point:

Can making basePrimes an implicitly unwrapped optional be avoided?

No, and the reason is as the above recursion tells us. If basePrimes is a non-optional type, then it has to be initialized at the initialization step. And since this is a copy of PostponedPrimeIterator in PostponedPrimeIterator, creating a value at initialization time would lead to infinite recursion and infinite space cost. Unfortunately as I cannot find the Apple reference on implicitly unwrapped optionals at this point, I will have to link a StackOverflow answer about them, which states:

Implicitly-unwrapped optionals are quite useful in cases where the value may not be present at initialization, but are set early and unlikely to become nil again.

This is the exact use-case that we have here, and as such I would say the implicitly unwrapped optional is the perfect tool to use.

defer { c += 2 }

This is a matter of style and your own tastes. What I find amusing is that this effectively is a return of C-style for loops, which were removed. For example, consider these two snippets of code:

Java:

int c = 3;
for ( /* no initialization */ ; /* no end case */ ; c += 2) {
 if ( isPrime(c) ) return c;
}

Swift:

var c = 3
while true {
 defer { c += 2 }
 if isPrime(c) {
 return c
 }
}

There is a slight difference, as in the Swift snippet the defer block runs after the return and it does not in the Java, but it would be trivial to use while { defer {} } to emulate a traditional C-style loop.

The point of showing that was to say that my first thought was a C-style for loop. This turns out to be correct, but I could have as easily intuited that the defer block only ran after a return and not just after the while block ended.

I feel the NSHipster article goes a little far in declaring that the defer {} return is in itself dangerous. I do agree, however, that when the defer's purpose is not immediately obvious from its context (such as when used for cleanup), it can be confusing. However, (and I'm sure an Apple example exists which I cannot find,) I think the use of defer right before a return is made clear in context. It is a bit different, but you can always see exactly when the code is going to run -- directly after the following return.

To draw an actual guideline from the above paragraph: use defer immediately following allocation for cleanup, or immediately before a return for code that needs to run after the return value is determined. As such, your loop changes in the following ways:

remove defer { c += 2 } from the first line
add defer { c += 2 } immediately before the only return
add c += 2 at the end of the loop

The key point here is the single exit point that the defer will be run on. The point is to keep defer located next to why it exists, which is in this case to run after the return.

I will reiterate that this particular guideline is mostly my opinion, and the opinion of the Swift community and Apple may at some point change (as in fact, mine will likely as well).

And now a few miscellaneous bits:

removeFirst and removeLast perform similarly. I performed a micro benchmark of removeFirst versus removeLast and removeLast was faster by about 1 millionth of a second on my machine. As such, you can use either. It might be negligibly faster to reverse basePrimes and use removeLast, however.

j at the bottom of your loop could have a descriptive name rather than a single letter. However, I do not have any suggestions, thus this being located here instead of at the top.

let time = endTime.timeIntervalSince(startTime) in your benchmarking code is an unused assignment. Additionally, it is usually the short name for iterators, rather than gen, as they are no longer generators.

These tweaks are unlikely to (noticeably) change execution time, but as always, if you are concerned about performance, test every change. The next step in optimization would be to add a wheel. But that's for another question.

My benchmarks were done on a Mac Mini (mid 2011) running macOS Sierra developer beta 3 with 8GB RAM, and as such my results are probably more dramatic than should be expected on modern machines.

Some quick replies: Good point about AnyIterator, that should work but might decrease the performance due to the additional level of indirection. I will test it later. – Your suggestion about defer is exactly what I had in a previous version, before I decided to remove all duplicates of "c += 2", so I can agree with that. – removeFirst is a protocol extension method of Collection and has complexity O(1). – time was meant to be used in the print statement, somehow that got duplicated.
(cont.) Thanks for the review, I will read it in more detail during the weekend.
@MartinR removeFirst() is indeed in a protocol extension of Collection, but with the constraint that Self == SubSequence. Array has implemented its own removeFirst() which runs in O(n) instead of O(1).
@TimVermeulen: You are right, I somehow arrived at the wrong definition. I did not make a measurable difference in my test code, that that's because the array is very short.
@TimVermeulen: So you could define it as a slice: var initialPrimes = ArraySlice([2, 3, 5, 7]), then removeFirst() becomes O(1).

Stack Exchange Network

Postponed Prime Sieve in Swift

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions