Motivated by this question, I looked out for other "infinite prime generators", i.e. functions which produce the list of prime numbers in increasing order and do not have an a-priori upper limit (such as the standard implementation of the Sieve of Eratosthenes). Repeated calling should produce "all" prime numbers (restricted only by time, available memory, and the limited range of the integer data type).
I found this answer (which was re-written for Python 3 later) to this SO question and have implemented that "postponed sieve" algorithm in Swift (Swift 3, Xcode 8 beta 3 required):
public final class PostponedPrimeIterator: IteratorProtocol {
private var basePrimes: PostponedPrimeIterator!
private var basePrime = 0
private var basePrimeSquared = 0
private var sieve: [Int: Int] = [:]
private var initialPrimes = [2, 3, 5, 7]
private var c = 9 // First candidate after fixed list
public func next() -> Int? {
if !initialPrimes.isEmpty {
return initialPrimes.removeFirst()
}
if c == 9 {
// Create the base prime generator and fetch the first odd prime:
basePrimes = PostponedPrimeIterator()
_ = basePrimes.next()
basePrime = basePrimes.next()!
basePrimeSquared = basePrime * basePrime
assert(c == basePrimeSquared)
}
while true {
defer { c += 2 }
let factor: Int
if let f = sieve.removeValue(forKey: c) {
// `c` is divisible by `f`
factor = f
} else if c == basePrimeSquared {
// `c` is the square of `p`
factor = basePrime
basePrime = basePrimes.next()!
basePrimeSquared = basePrime * basePrime
} else {
// `c` is a prime number
assert(c < basePrimeSquared)
return c
}
// Insert next odd number which is divisiby by `factor` but not present in the sieve:
var j = c + 2 * factor
while sieve[j] != nil {
j += 2 * factor
}
sieve[j] = factor
}
}
}
public struct PostponedPrimeSequence: Sequence {
public func makeIterator() -> PostponedPrimeIterator {
return PostponedPrimeIterator()
}
}
The interesting thing about that algorithm is that the prime generator needs a list of "base primes" which are created using another instance of itself. This works because
- the base prime generator is created "delayed", after producing some fixed primes, and
- in order to generate primes up to \$ N \,ドル base primes are needed only up to \$ \sqrt N \,ドル
so that the creation of nested prime generators terminates eventually. (The number of nested prime generators is \$O(\log \log N) \,ドル thanks to Will Ness for pointing that out.)
The sieving is done using a dictionary which holds the "next composite numbers" each of which is a multiple of some prime among the base primes (i.e. such that are not bigger than the square root of the current candidate).
As explained in the above-mentioned answer, the required memory to produce \$ n \$ primes is \$ O(\sqrt n) \$.
Example usage (first 20 primes):
let p20 = Array(PostponedPrimeSequence().prefix(20))
print(p20) // [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]
Benchmarking (measure the time to compute the 1,000,000th prime number, on a 1,2 GHz Intel Core m5 MacBook):
let TERMS = 1_000_000
let startTime = Date()
var gen = PostponedPrimeIterator()
for _ in 1 ..< TERMS { _ = gen.next() }
let p = gen.next()!
let endTime = Date()
let time = endTime.timeIntervalSince(startTime)
print(p, String(format: "%.02f", endTime.timeIntervalSince(startTime)), "sec")
// 15485863 1.26 sec
All kinds of feedback is welcome, including, but of course not limit to:
- Better type/property/variable names? I chose
PostponedPrimeIterator
becauseGenerator
was renamed toIterator
in Swift 3. As far as I have seen, all iterators in the Swift standard library are
struct
s, i.e. value types, and Apple recommends to work with value types whenever possible. I needed to make it aclass
because of the self-referencing propertyprivate var basePrimes: PostponedPrimeIterator!
Is there a better solution?
- Swift does not have a
yield
statement as in Python, which makes some kind of state machine necessary: The first primes are served from an array, then the base prime generator is created, and from then on the primes are produced by the sieving algorithm. Is there a more elegant way to manage these different states? Can makingbasePrimes
an implicitly unwrapped optional be avoided? defer { c += 2 }
is called in the main loop to increment the variable even if the loop was exited viareturn c
. Would you consider that a reasonable use? (This seems to consider it "harmful".)
-
1\$\begingroup\$ Related. Will Ness later brought his code here for (Python) optimization. \$\endgroup\$CAD97– CAD972016年07月29日 16:16:53 +00:00Commented Jul 29, 2016 at 16:16
-
\$\begingroup\$ As this was the next logical step from my code, I will definitely have some things to say. I've just got to do a bit of digging through WWDC Swift3 talks to source my claims ;D \$\endgroup\$CAD97– CAD972016年07月29日 16:56:53 +00:00Commented Jul 29, 2016 at 16:56
-
\$\begingroup\$ @CAD97: Thanks for the link, I hadn't seen that. Python is not my first language so it will take me a while to digest that :) – Your feedback is most welcome! \$\endgroup\$Martin R– Martin R2016年07月29日 19:42:06 +00:00Commented Jul 29, 2016 at 19:42
-
1\$\begingroup\$ I ended up remaking this in Kotlin, if you're interested. (Who knew that generating primes is a great way to take a language through the paces! :P) \$\endgroup\$CAD97– CAD972016年08月25日 16:38:02 +00:00Commented Aug 25, 2016 at 16:38
-
1\$\begingroup\$ @CAD97: Unfortunately I don't know anything about Kotlin :) But I must admit that I am a bit proud of my explanation of the algorithm. Perhaps it is trivial for the people working in that field, but I found some answers on SO and CR quite terse. Thanks for quoting it with kind words! \$\endgroup\$Martin R– Martin R2016年08月25日 18:20:49 +00:00Commented Aug 25, 2016 at 18:20
2 Answers 2
Let me prefix this answer by a couple things:
- I only about 90% understand the algorithm in full. I know enough to tinker with it without breaking it, but I couldn't explain it to someone with pen and paper.
- As Swift 3 is still beta, some of the style guides are still up in the air. For the most part, I've tried to source purely stylistic comments from Apple's examples and documentation, but those might change and the community might end up with different ideas than Apple preached.
Now, addressing your points:
Better type/property/variable names?
I fully agree on PostponedPrimeIterator
. It agrees with prexisting types, such as ClosedRangeIterator
, DictionaryIterator
, EnumeratedIterator
, etc. The only variable I would potentially rename is private var c
, to a more descriptive name of candidate
(as your comment clarifies).
As far as I have seen, all iterators in the Swift standard library are
struct
s.
As far as the reference page goes, it seems that the only class
es in the standard library are Managed
or non-@objc
for the type system.
Is there a [way to make PostponedPrimeIterator a
struct
]?
I won't comment on whether this Iterator
would be better as a struct
or class
: that's a bit philosophical and comes down to preference and your own benchmarks. However, if you do want to make it a struct
, it is possible:
The reason that you cannot just make this a struct
is that the compiler spits out the error error: value type 'Type' cannot have a stored property that references itself
. The solution is to not directly recurse types, but to store a type-erased iterator.
The changes to use AnyIterator
and struct
ify PostponedPrimeIterator
are these lines:
private struct PostponedPrimeIterator: IteratorProtocol {
private var basePrimes: AnyIterator<Int>!
...
public mutating func next() -> Int? {
...
if c == 9 {
basePrimes = AnyIterator(PostponedPrimeIterator())
...
}
...
}
In this manner you can get value semantics for your iterator. I made a micro benchmark to test the cost of wrapping a simple Array
's IndexingIterator
in an AnyIterator
, which came out to about 0.006s per 10,000 items on my machine, so the cost is small but not negligible.
Again, I leave it up to you to decide if this is worth it: to me it feels like we are working against the compiler. If I am not mistaken, the reason for the compiler prohibiting simple property type recursion is that it can often lead to infinite space as each copy of the struct
stores its own copy of the struct
which stores its own copy of the struct
which stores its own <snip> which leads me to my next point:
Can making
basePrimes
an implicitly unwrapped optional be avoided?
No, and the reason is as the above recursion tells us. If basePrimes
is a non-optional type, then it has to be initialized at the initialization step. And since this is a copy of PostponedPrimeIterator
in PostponedPrimeIterator
, creating a value at initialization time would lead to infinite recursion and infinite space cost. Unfortunately as I cannot find the Apple reference on implicitly unwrapped optionals at this point, I will have to link a StackOverflow answer about them, which states:
Implicitly-unwrapped optionals are quite useful in cases where the value may not be present at initialization, but are set early and unlikely to become nil again.
This is the exact use-case that we have here, and as such I would say the implicitly unwrapped optional is the perfect tool to use.
defer { c += 2 }
This is a matter of style and your own tastes. What I find amusing is that this effectively is a return of C-style for loops, which were removed. For example, consider these two snippets of code:
Java:
int c = 3;
for ( /* no initialization */ ; /* no end case */ ; c += 2) {
if ( isPrime(c) ) return c;
}
Swift:
var c = 3
while true {
defer { c += 2 }
if isPrime(c) {
return c
}
}
There is a slight difference, as in the Swift snippet the defer
block runs after the return
and it does not in the Java, but it would be trivial to use while { defer {} }
to emulate a traditional C-style loop.
The point of showing that was to say that my first thought was a C-style for loop. This turns out to be correct, but I could have as easily intuited that the defer
block only ran after a return
and not just after the while
block ended.
I feel the NSHipster article goes a little far in declaring that the defer {} return
is in itself dangerous. I do agree, however, that when the defer
's purpose is not immediately obvious from its context (such as when used for cleanup), it can be confusing. However, (and I'm sure an Apple example exists which I cannot find,) I think the use of defer
right before a return
is made clear in context. It is a bit different, but you can always see exactly when the code is going to run -- directly after the following return
.
To draw an actual guideline from the above paragraph: use defer
immediately following allocation for cleanup, or immediately before a return
for code that needs to run after the return
value is determined. As such, your loop changes in the following ways:
- remove
defer { c += 2 }
from the first line - add
defer { c += 2 }
immediately before the onlyreturn
- add
c += 2
at the end of the loop
The key point here is the single exit point that the defer will be run on. The point is to keep defer
located next to why it exists, which is in this case to run after the return
.
I will reiterate that this particular guideline is mostly my opinion, and the opinion of the Swift community and Apple may at some point change (as in fact, mine will likely as well).
And now a few miscellaneous bits:
removeFirst
and removeLast
perform similarly. I performed a micro benchmark of removeFirst
versus removeLast
and removeLast
was faster by about 1 millionth of a second on my machine. As such, you can use either. It might be negligibly faster to reverse basePrimes
and use removeLast
, however.
j
at the bottom of your loop could have a descriptive name rather than a single letter. However, I do not have any suggestions, thus this being located here instead of at the top.
let time = endTime.timeIntervalSince(startTime)
in your benchmarking code is an unused assignment. Additionally, it
is usually the short name for iterators, rather than gen
, as they are no longer generators.
These tweaks are unlikely to (noticeably) change execution time, but as always, if you are concerned about performance, test every change. The next step in optimization would be to add a wheel. But that's for another question.
My benchmarks were done on a Mac Mini (mid 2011) running macOS Sierra developer beta 3 with 8GB RAM, and as such my results are probably more dramatic than should be expected on modern machines.
-
\$\begingroup\$ Some quick replies: Good point about AnyIterator, that should work but might decrease the performance due to the additional level of indirection. I will test it later. – Your suggestion about defer is exactly what I had in a previous version, before I decided to remove all duplicates of "c += 2", so I can agree with that. – removeFirst is a protocol extension method of Collection and has complexity O(1). –
time
was meant to be used in the print statement, somehow that got duplicated. \$\endgroup\$Martin R– Martin R2016年07月29日 20:58:43 +00:00Commented Jul 29, 2016 at 20:58 -
\$\begingroup\$ (cont.) Thanks for the review, I will read it in more detail during the weekend. \$\endgroup\$Martin R– Martin R2016年07月29日 20:59:58 +00:00Commented Jul 29, 2016 at 20:59
-
1\$\begingroup\$ @MartinR
removeFirst()
is indeed in a protocol extension ofCollection
, but with the constraint thatSelf == SubSequence
.Array
has implemented its ownremoveFirst()
which runs in O(n) instead of O(1). \$\endgroup\$Tim Vermeulen– Tim Vermeulen2016年08月01日 15:15:46 +00:00Commented Aug 1, 2016 at 15:15 -
\$\begingroup\$ @TimVermeulen: You are right, I somehow arrived at the wrong definition. I did not make a measurable difference in my test code, that that's because the array is very short. \$\endgroup\$Martin R– Martin R2016年08月01日 16:35:02 +00:00Commented Aug 1, 2016 at 16:35
-
2\$\begingroup\$ @TimVermeulen: So you could define it as a slice:
var initialPrimes = ArraySlice([2, 3, 5, 7])
, then removeFirst() becomes O(1). \$\endgroup\$Martin R– Martin R2016年08月01日 18:38:21 +00:00Commented Aug 1, 2016 at 18:38
@CAD97 already wrote an excellent review, I just have a couple things to add:
- I made
basePrimes
a normal optional instead of an implicitly unwrapped optional. Instead of initialising it whenc == 9
, we can unwrap it usingguard
, and initialise it if it doesn't have a value. - I also made
basePrime
an optional because the default value of0
was pretty arbitrary. basePrimeSquared
seemed more trouble than it's worth, so I got rid of it.- In order to reduce the number of force unwraps (
basePrimes.next()!
), I added a privatenextPrime()
function that returns a non-optionalInt
, andnext()
simply callsnextPrime()
. - I made
initialPrimes
anArraySlice
instead of anArray
as you suggested in the comments, to makeremoveFirst()
an O(1) operation instead of O(n). - Your
while sieve[j] != nil
loop can (partly) be replaced bysequence(start:next:)
, which allows us to get rid ofvar j
. - I added some minor tweaks, like changing an
if
into aguard
and renamingc
tocandidate
.
Here's the result:
public final class PostponedPrimeIterator: IteratorProtocol {
private var basePrimes: PostponedPrimeIterator?
private var basePrime: Int?
private var sieve: [Int: Int] = [:]
private var initialPrimes: ArraySlice = [2, 3, 5, 7]
private var candidate = 9
public func next() -> Int? {
return nextPrime()
}
private func nextPrime() -> Int {
guard initialPrimes.isEmpty else {
return initialPrimes.removeFirst()
}
guard let basePrimes = basePrimes else {
// Create the base prime generator:
let basePrimes = PostponedPrimeIterator()
self.basePrimes = basePrimes
// Fetch the first odd prime:
_ = basePrimes.next()
basePrime = basePrimes.next()
return nextPrime()
}
while true {
defer { candidate += 2 }
let factor: Int
if let f = sieve.removeValue(forKey: candidate) {
// `candidate` is divisible by `f`
factor = f
} else if let basePrime = basePrime, candidate == basePrime * basePrime {
// `candidate` is the square of `basePrime`
factor = basePrime
self.basePrime = basePrimes.nextPrime()
} else {
// `candidate` is a prime number
return candidate
}
// Insert next odd number which is divisiby by `factor` but not present in the sieve:
let nextOddMultiples = sequence(first: candidate + 2 * factor, next: { 0ドル + 2 * factor })
for nextOddMultiple in nextOddMultiples where sieve[nextOddMultiple] == nil {
sieve[nextOddMultiple] = factor
break
}
}
}
}
My benchmarks indicate no significant performance difference between this code and your original code, but I got rid of all the !
s, so I think it's an improvement :)
I'm still not really happy about the while true
loop, but I didn't manage to get rid of it without sacrificing performance.
-
\$\begingroup\$ I am not a fan of "guard" as a general replacement for "if not", as in
guard initialPrimes.isEmpty
(but that is just my opinion). –basePrime
has a value once the "main loop" is entered, so checking it withif let basePrime = basePrime
seems misleading to me. – I think I tried anextPrime()
method with anext()
wrapper and it was slower, but I am not sure anymore. –guard let basePrimes = basePrimes
(and the recursive call ofnextPrime()
) is a good idea. \$\endgroup\$Martin R– Martin R2016年08月01日 21:28:37 +00:00Commented Aug 1, 2016 at 21:28 -
\$\begingroup\$ The first guard is indeed a matter of opinion. I like to generally use
guard
if it always exits the scope, butif
works just fine. I agree thatif let basePrime = basePrime
is misleading, but it can't be unwrapped at the same time asbasePrimes
because it might change in thewhile
loop. I guess there's no perfect solution. \$\endgroup\$Tim Vermeulen– Tim Vermeulen2016年08月01日 22:07:08 +00:00Commented Aug 1, 2016 at 22:07