lua-users home
lua-l archive

Re: Tips and questions concerning a prime sieve in Lua

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hello Jairo,
On Sat, Dec 05, 2020 at 03:47:05AM -0500, Jairo A. del Rio wrote:
> ...
Thank you for this opportunity to indulge in prime hunting, which, as we
all know, is one of the most healthy and useful activities.
The Xuedong Luo method and your implementation are markedly tight, with
priority to performance in the details. In contrast, I have attempted a
more relaxed implementation of Eratosthenes' sieve (a module Sieve1.lua)
to make it easier to understand what goes on and experiment with
variations. That implementation, in spite of this more relaxed approach,
is, in some cases, faster than yours.
Working on a description of Sieve1.lua, I found, however, that simpler
implementations, developed to support the description, also had merit
and actually led to some significant performance improvements of your
implementation of the Xuedong Luo method. So that's where I'll start.
Before doing that, however, I'll briefly summarize the adjustments to
your luo.lua in the shape of a diff between luo2.lua (an instrumented
version of your luo.lua) and luo4.lua that includes the adjustments:
 $ diff luo2.lua luo4.lua
 24c24
 < step = step == 2 and 4 or 2
 ---
 > step = 6 - step
 33c33
 < j = c
 ---
 > local j = c
 36,40c36,42
 < while j <= M do
 < S[j] = 0
 < j = j + ij
 < ij = t - ij
 < count = count + 1
 ---
 > if S[i] ~= 0 then
 > while j <= M do
 > S[j] = 0
 > j = j + ij
 > ij = t - ij
 > count = count + 1
 > end
 46,48c48,50
 < for _, v in next, S do
 < if v > 0 then
 < result[#result + 1] = v
 ---
 > for i = 1, #S do
 > if S[i] > 0 then
 > result[#result + 1] = S[i]
 $
These adjustments cut roughly a third of the run time required for
calculating the primes below 100000. Details follow.
The basic mechanism, attributed to Eratosthenes (about -276 to -195),
for finding primes is the following (see
https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes):
Start by listing all positive integers to some desired limit, here 25:
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
The number 1 is not considered prime (and if used for sieving would
create havoc), so take the next number, 2, in the list, that's the first
prime. Now, since any number divisible by a prime is composite (unless
it is the prime itself), we can remove as composite all multiples of 2
from the list, excepting 2, leaving us with:
 1 2 3 5 7 9 11 13 15 17 19 21 23 25
The next number following 2 in the list is 3, so that's the second
prime. As before, we remove multiples of that prime from the list,
excepting the prime itself, to get rid of additional composites, leaving
us with:
 1 2 3 5 7 11 13 17 19 23 25
The next number in the list is 5, the third prime. We haven't said that
before, but it actually suffices, when removing multiples of some prime
p, to start at p^2 (= p*p, p squared). The reason is that composite
numbers below p^2 are divisible by a prime strictly less than p and so,
would have been removed earlier. For example, for p=5, 10=2*5 were
removed when removing multiples of 2. And 15=3*5 were removed when
removing multiples of 3.
The only multiple of 5 in the list, excluding 5 itself, is 25. After
removing 25 from the list, we are left with:
 1 2 3 5 7 11 13 17 19 23
Argueing as before, 7 is the fourth prime and 7*7 = 49 is the next
number targeted for removal. But 49 is beyond the list, so we are done
and all numbers in the list (except 1) are prime.
In Lua, this becomes Eratos1.lua:
 $ cat Eratos1.lua
 -- Eratos1.lua: Inspired by "Jairo A. del Rio" <jairoadelrio6@gmail.com>: Tips and questions concerning a prime sieve in Lua, simple sieve
 -- 2021-Apr-01 18.56 / TN
 print( "Eratos1.lua: 2021-Apr-28 21.21 / TN" )
 -- for k, v in pairs( arg ) do
 -- print( "arg[" .. k .. "] = <" .. v .. ">" )
 -- end
 local print = print
 local tonumber = tonumber
 local arg = arg
 _ENV = nil
 local function Eratos1( l )
 local list = {}
 for i = 1, l do list[i] = i end
 local x = 1
 local count = 0
 while true do
 local p
 repeat
 x = x + 1
 p = list[x]
 until p ~= 0
 if not p or p*p > l then
 break
 end
 -- print( p )
 for i = p*p, l, p do
 list[i] = 0
 -- count = count + 1
 end
 end
 local prime = {}
 -- do return prime end
 for i = 2, l do
 if list[i] ~= 0 then
 -- print( "prime[" .. i .. "] = " .. list[i] )
 prime[#prime+1] = list[i]
 end
 end
 return prime, count
 end
 local l = tonumber( arg[1] )
 local c = tonumber( arg[2] or 1 )
 local result
 local count
 for i = 1, c do
 result, count = Eratos1( l )
 end
 print( "pi(" .. l .. ") = " .. #result .. ", count=" .. count .. " (calculated " .. c .. " times)" )
 $
(Much code quoted is work in progress, so please apologize for
occasional rough edges, such as remains of debugging activities.)
Briefly, list[i] represents the number i and is initialized with i
itself for a start. Subsequently, when sieving establishes that some i
is composite, list[i] is zeroed. Finally, the list is scanned for
non-zero numbers: These are all the primes and hence transferred to the
result prime list.
Establishing a suitable baseline:
 $ uname -a
 CYGWIN_NT-6.1-WOW Annette-Pc2 3.1.7(0.340/5/3) 2020年08月22日 19:03 i686 Cygwin
 $ time lua-5.4.3 Eratos1.lua 1000000 100
 Eratos1.lua: 2021-Apr-28 21.21 / TN
 pi(1000000) = 78498, count=0 (calculated 100 times)
 real 0m27.616s
 user 0m24.585s
 sys 0m2.870s
 $
The calculation is summarized by printing pi(1000000), the number of
primes below the limit 1000000. Specific values of the prime-counting
function pi(x) can be found at
 https://en.wikipedia.org/wiki/Prime-counting_function.
Eratos1.lua with l=1000000 and c=100 performs the same calculation as
your (Jairo's) luo.lua:
 $ time lua-5.4.3 luo.lua
 real 0m21.828s
 user 0m20.373s
 sys 0m1.388s
 $
The situation is thus not entirely hopeless: Your program is faster,
yes, but not dramatically so.
So how can we improve this? A common idea used in the Xuedong Luo method
is making a special case of the smallest primes. To illustrate this,
let's make a special case of 2. This means that we can avoid handling
numbers divisible by 2, so some operations need only be carried out half
the number of times. This is Eratos2.lua:
 $ cat Eratos2.lua
 -- Eratos2.lua: Inspired by "Jairo A. del Rio" <jairoadelrio6@gmail.com>: Tips and questions concerning a prime sieve in Lua, special case 2 sieve
 -- 2021-Apr-01 18.58 / TN
 print( "Eratos2.lua: 2021-Apr-09 19.50 / TN" )
 local print = print
 local tonumber = tonumber
 local arg = arg
 local os = os
 _ENV = nil
 local function Eratos2( l, clockAccum )
 local clockStart
 local clockEnd
 clockStart = os.clock()
 local l0 = ( l + 1 ) // 2
 local list = {}
 for i = 1, l0 do list[i] = 2*i - 1 end
 local x = 1
 local count = 0
 clockEnd = os.clock()
 clockAccum.Init = (clockAccum.Init or 0) + clockEnd - clockStart
 clockStart = clockEnd
 while true do
 local p
 repeat
 x = x + 1
 p = list[x]
 until p ~= 0
 if not p or p*p > l then
 break
 end
 -- print( p )
 for i = ( p*p + 1 ) // 2, l0, p do
 list[i] = 0
 count = count + 1
 end
 end
 clockEnd = os.clock()
 clockAccum.Sieve = (clockAccum.Sieve or 0) + clockEnd - clockStart
 clockStart = clockEnd
 local prime = {}
 -- do return prime end
 if 2 <= l then
 prime[#prime+1] = 2
 end
 for i = 2, l0 do
 if list[i] ~= 0 then
 -- print( "prime[" .. i .. "] = " .. list[i] )
 prime[#prime+1] = list[i]
 end
 end
 clockEnd = os.clock()
 clockAccum.Pack = (clockAccum.Pack or 0) + clockEnd - clockStart
 return prime, count
 end
 local l = tonumber( arg[1] )
 local c = tonumber( arg[2] or 1 )
 local result
 local count
 local clockStart = os.clock()
 local clockAccum = {}
 for i = 1, c do
 result, count = Eratos2( l, clockAccum )
 end
 local clockEnd = os.clock()
 print( "pi(" .. l .. ") = " .. #result .. ", count=" .. count .. " (calculated " .. c .. " times)" )
 print( "clockAccum.Init=" .. clockAccum.Init )
 print( "clockAccum.Sieve=" .. clockAccum.Sieve )
 print( "clockAccum.Pack=" .. clockAccum.Pack )
 print( "clockTotal=" .. clockEnd - clockStart )
 $
In Eratos2.lua, the list only holds the odd numbers, so list[1]=1,
list[2]=3, list[3]=5, and so on. The number of elements of the list is
calculated as l0 = (l+1)//2 for the limit l. This ensures that the list
is able to hold the largest odd number less than or equal to l.
The i'th odd number is easily calculated as 2*i-1.
Selecting the next prime, p, for sieving can be done in the same manner
as in Eratos1.lua, since the list still contains actual numbers.
To remove (set to zero) the numbers p*p, p*p+p, p*p+p+p, ... from the
list, we need to find the indexes of these numbers. Or some of them, at
least, since we are only dealing with odd numbers now.
In general, to find the index of an odd number, k, we calculate
(k+1)//2. And since p is odd, p*p is likewise odd and (p*p+1)//2 is the
index of p*p in the list. This explains the low limit of the loop that
iterates over the multiples of p.
It is clear that every second number in p*p, p*p+p, p*p+p+p, ... is even
and should not be used. And a bit of consideration shows that the
indexes of two neighboring odd numbers in the list p*p, p*p+p, p*p+p+p,
... differ by exactly p. So p is used as the step in the sieve loop.
And running this:
 $ time lua-5.4.3 Eratos2.lua 1000000 100
 Eratos2.lua: 2021-Apr-09 19.50 / TN
 pi(1000000) = 78498, count=811068 (calculated 100 times)
 clockAccum.Init=5.577
 clockAccum.Sieve=6.092
 clockAccum.Pack=3.089
 clockTotal=14.758
 real 0m14.846s
 user 0m13.104s
 sys 0m1.684s
 $
This is really interesting, since it beats luo.lua by a significant
margin. I'll deal with this unexpected fact later. There is also some
instrumentation (the count and clock output) intended to help analyze
performance. Before doing that, however, let's take matters a step
further and specialize with respect to the prime 3, as well as the prime
2. This is the same level of specialization used in the Xuedong Luo
method. So here is Eratos3.lua doing that:
 $ cat Eratos3.lua
 -- Eratos3.lua: Inspired by "Jairo A. del Rio" <jairoadelrio6@gmail.com>: Tips and questions concerning a prime sieve in Lua, special case 2, 3 sieve
 -- 2021-Apr-01 18.59 / TN
 print( "Eratos3.lua: 2021-Apr-18 19.25 / TN" )
 local print = print
 local tonumber = tonumber
 local arg = arg
 local os = os
 local error = error
 _ENV = nil
 local function Eratos3( l, clockAccum )
 local clockStart
 local clockEnd
 clockStart = os.clock()
 local l0 = ( l + l%2 + 1 ) // 3
 local list = {}
 local u = 1
 local v = 4
 for i = 1, l0 do
 -- list[i] = 3*(i-1) + (i-1)%2 + 1
 list[i] = u
 u = u + v
 v = 6 - v
 -- print( "list[" .. i .. "] = " .. list[i] )
 end
 local x = 1
 local count = 0
 clockEnd = os.clock()
 clockAccum.Init = (clockAccum.Init or 0) + clockEnd - clockStart
 clockStart = clockEnd
 while true do
 local p
 repeat
 x = x + 1
 p = list[x]
 until p ~= 0
 if not p or p*p > l then
 break
 end
 local i = p*p // 3 + 1
 local q = p // 6
 local d
 if p % 6 == 1 then
 d = 8 * q + 1
 else
 d = 4 * q + 3
 end
 -- print( p )
 local z = 2 * p
 -- local c = 0
 while i <= l0 do
 list[i] = 0
 i = i + d
 d = z - d
 -- c = c + 1
 count = count + 1
 end
 -- print( "p=" .. p .. ", c=" .. c )
 end
 clockEnd = os.clock()
 clockAccum.Sieve = (clockAccum.Sieve or 0) + clockEnd - clockStart
 clockStart = clockEnd
 local prime = {}
 -- do return prime end
 for i = 2, 3 do
 if i <= l then prime[#prime+1] = i end
 end
 for i = 2, l0 do
 if list[i] ~= 0 then
 -- print( "prime[" .. i .. "] = " .. list[i] )
 prime[#prime+1] = list[i]
 end
 end
 clockEnd = os.clock()
 clockAccum.Pack = (clockAccum.Pack or 0) + clockEnd - clockStart
 return prime, count
 end
 local l = tonumber( arg[1] )
 local c = tonumber( arg[2] or 1 )
 local result
 local count
 local clockStart = os.clock()
 local clockAccum = {}
 for i = 1, c do
 result, count = Eratos3( l, clockAccum )
 end
 local clockEnd = os.clock()
 print( "pi(" .. l .. ") = " .. #result .. ", count=" .. count .. " (calculated " .. c .. " times)" )
 print( "clockAccum.Init=" .. clockAccum.Init )
 print( "clockAccum.Sieve=" .. clockAccum.Sieve )
 print( "clockAccum.Pack=" .. clockAccum.Pack )
 print( "clockTotal=" .. clockEnd - clockStart )
 $
The list should hold the numbers that are neither divisible by 2 nor 3,
so list[1]=1, list[2]=5, list[3]=7, list[4]=11, list[5]=13, list[6]=17,
list[7]=19, and so on. These are all the numbers of the form 6*k+1 and
6*k+5. And note that the difference between neighboring numbers
alternates between 4 and 2.
The number of elements in the list is calculated as
 l0 = ( l + l%2 + 1 ) // 3.
A bit of thought and experimenting serve to convince ourselves that the
list thereby becomes able to hold the largest number of the form 6*k+1
or 6*k+5 that are less than or equal to l.
The list itself is initialized by the values of u, initially 1, that are
formed by repeatedly adding v which alternates between 4 and 2. This is
similar to what happens in luo.lua and generates 1, 5, 7, 11, 13 ...
The sieve loop is more intricate: We need to map from the values that
should be excluded (p*p, p*p+p, ...) to their indexes in order to zero
the correct numbers in the list.
Observe initially that for the numbers in the list of the form 6*k+1,
the index is 2*k+1. And for the numbers of the form 6*k+5, the index is
2*k+2. For example, the index of 7=6*1+1 where k=1 is 2*k+1=3. And the
index of 17=6*2+5 with k=2 is 2*k+2=6.
If a number, n, is of one of the forms 6*k+1 or 6*k+5, we can convince
ourselves that its index in the list is n//3+1: If n = 6*k+1, n//3 = 2*k
and n//3+1 is the desired index value 2*k+1. And if n = 6*k+5, n//3 =
2*k+1 and again n//3+1 is the desired index value which is 2*k+2.
The index of p*p, the initial value of i, is therefore calculated as
p*p//3+1.
Now we have to zero the numbers with indexes that correspond to the
subset of the numbers p*p, p*p+p, p*p+2*p, ... which are not divisible
by 2 or 3. Let's begin by considering an example, p=5: For this p, the
numbers we wish to exclude are
 25 30 35 40 45 50 55 60 65 70 75 80 85 ...
>From this list, we should exclude numbers divisible by 2 or 3, that is,
numbers that are not of the form 6*k+1 or 6*k+5. These numbers are
already excluded from consideration because we have specialized with
respect to 2 and 3. Excluding numbers divisible by 2 or 3 leaves us with
 25 35 55 65 85 ...
The corresponding list of indexes (n//3+1 for each n in the list) is
 9 12 19 22 29 ...
So adding 3 and 7 alternatingly generates these indexes.
Another example, p=7: Excluding numbers divisible by 2 or 3 from
 49 56 63 70 77 84 91 98 105 112 119 126 133 ...
leaves
 49 77 91 119 133 ...
so the list of indexes to be zeroed is
 17 26 31 40 45 ...
Similar pattern as for p=5: Adding 9 and 5 alternatingly generates these
indexes.
In general, for a p of the form 6*k+1, adding d = 8*(p//6)+1 and 2*p-d
alternatingly generates the desired indexes. And for a p of the form
6*k+5, adding d = 4*(p//6)+3 and 2*p-d alternatingly does the job.
That's what happens in Eratos3.lua.
Running Eratos3.lua:
 $ time lua-5.4.3 Eratos3.lua 1000000 100
 Eratos3.lua: 2021-Apr-18 19.25 / TN
 pi(1000000) = 78498, count=429626 (calculated 100 times)
 clockAccum.Init=4.588
 clockAccum.Sieve=5.815
 clockAccum.Pack=2.561
 clockTotal=12.964
 real 0m13.024s
 user 0m11.575s
 sys 0m1.435s
 $
So Eratos3.lua is slightly faster than Eratos2.lua: The clockTotal for
Eratos2.lua reported above is 14.586.
Getting back to your (Jairo's) luo.lua, to enable a more detailed
comparison with Eratos3.lua, I have instrumented luo.lua in the same way
as Eratos3.lua. The difference between your original luo.lua and the
instrumented luo2.lua is:
 $ diff luo.lua luo2.lua
 4c4,7
 < function primesieve(N)
 ---
 > function primesieve(N,clockAccum)
 > local clockStart
 > local clockEnd
 > clockStart = os.clock()
 22a26,29
 > local count = 0
 > clockEnd = os.clock()
 > clockAccum.Init = (clockAccum.Init or 0) + clockEnd - clockStart
 > clockStart = clockEnd
 32a40
 > count = count + 1
 34a43,45
 > clockEnd = os.clock()
 > clockAccum.Sieve = (clockAccum.Sieve or 0) + clockEnd - clockStart
 > clockStart = clockEnd
 40c51,53
 < return result
 ---
 > clockEnd = os.clock()
 > clockAccum.Pack = (clockAccum.Pack or 0) + clockEnd - clockStart
 > return result, count
 42a56,58
 > local count
 > local clockStart = os.clock()
 > local clockAccum = {}
 44c60
 < X = #primesieve(1000000)
 ---
 > X, count = primesieve(1000000,clockAccum)
 45a62,67
 > local clockEnd = os.clock()
 > print( "pi(" .. 1000000 .. ") = " .. #X .. ", count=" .. count .. " (calculated " .. 100 .. " times)" )
 > print( "clockAccum.Init=" .. clockAccum.Init )
 > print( "clockAccum.Sieve=" .. clockAccum.Sieve )
 > print( "clockAccum.Pack=" .. clockAccum.Pack )
 > print( "clockTotal=" .. clockEnd - clockStart )
 $
And running luo2.lua:
 $ time lua-5.4.3 luo2.lua
 pi(1000000) = 78498, count=580993 (calculated 100 times)
 clockAccum.Init=6.386
 clockAccum.Sieve=10.53
 clockAccum.Pack=4.955
 clockTotal=21.871
 real 0m22.000s
 user 0m20.326s
 sys 0m1.575s
 $
The first thing to notice here is that the figures reported by the time
command are not significantly larger than the corresponding figures
reported for the original luo.lua, repeated here:
 $ time lua-5.4.3 luo.lua
 real 0m21.828s
 user 0m20.373s
 sys 0m1.388s
 $
If there is a small difference, this is most likely caused primarily by
the count = count + 1 which has been added to the inner sieving loop.
Before continuing, some words about the instrumentation used: Resource
usage is measured by the Lua os.clock() function which is based on the
standard C library clock() function. To measure resource usage of some
section of the program, take the difference between the values of
os.clock() before and after that program section executes.
It is usually important to avoid too many calls to os.clock(), since
those calls and the accompanying calculations take time and may disturb
the measurements significantly. In any case, it is preferable to measure
fairly long-running program sections, since the clock() may "tick" with
quite large intervals, that is, be relatively inaccurate.
For both Eratos3.lua and luo2.lua, three os.clock() based measurements
are taken, one for each of the main sections: Initialization, sieving,
and packing of the result. And because the base test consists of 100
repetitions of the calculation of the primes below 1000000, we need to
accumulate the measurements for each section. This is handled, a bit
cumbersome, using the clockAccum table parameter which has been added to
the main sieve function.
In addition to the os.clock() measurements, the number of times the
inner sieve loop is executed is counted. There is no need for
accumulation here, so that value, the count, is simply returned as an
additional result of the main sieve function.
As a final check, the executions are performed under the control of the
time command which provides a simple sanity check for the figures
reported.
Now compare the measurements for luo2.lua and Eratos3.lua:
 $ time lua-5.4.3 luo2.lua
 pi(1000000) = 78498, count=580993 (calculated 100 times)
 clockAccum.Init=6.386
 clockAccum.Sieve=10.53
 clockAccum.Pack=4.955
 clockTotal=21.871
 real 0m22.000s
 user 0m20.326s
 sys 0m1.575s
 $
 $ time lua-5.4.3 Eratos3.lua 1000000 100
 Eratos3.lua: 2021-Apr-18 19.25 / TN
 pi(1000000) = 78498, count=429626 (calculated 100 times)
 clockAccum.Init=4.588
 clockAccum.Sieve=5.815
 clockAccum.Pack=2.561
 clockTotal=12.964
 real 0m13.024s
 user 0m11.575s
 sys 0m1.435s
 $
We notice several things: The initialization by luo2.lua is a couple of
seconds slower than the initialization by Eratos3.lua. Comparing the
code, luo2.lua has
 local step = 2
 local first = 5
 while first <= N do
 S[#S+1] = first
 first = first + step
 step = step == 2 and 4 or 2
 end
whereas Eratos3.lua uses:
 local u = 1
 local v = 4
 for i = 1, l0 do
 list[i] = u
 u = u + v
 v = 6 - v
 end
Adjusting luo2.lua to use step = 6-step should save a little.
Also the packing of final result, the list of primes, is slower in
luo2.lua that uses:
 for _, v in next, S do
 if v > 0 then
 result[#result + 1] = v
 end
 end
Where Eratos3.lua uses:
 for i = 2, l0 do
 if list[i] ~= 0 then
 -- print( "prime[" .. i .. "] = " .. list[i] )
 prime[#prime+1] = list[i]
 end
 end
Actually, the luo2.lua code is not only slower, but actually risks
generating a list of primes out of order: There is no guarantee that the
table keys are processed in numerical order when using next.
More mysterious initially is the fact that luo2.lua uses 580993
iterations of the inner sieve loop where Eratos3.lua can make do with
only 429626. The reason turns out to be that luo2.lua spends time
eliminating multiples of composite numbers: For example, multiples of 25
are eliminated, but this is unnecessary, since these multiples have all
been eliminated when multiples of 5 were eliminated. Figuring out what
to do about this takes some thinking, but fortunately the fix is easy:
Simply refrain from executing the inner sieve loop if S[i] == 0: S[i] is
the value whose multiples are being zeroed, so if S[i] is zero, it is
composite and, hence, its multiples have already been excluded. (This
improvement is relevant also for the original Xuedon Luo method.)
Similar logic is found in Eratos3.lua in the form of the following loop:
 repeat
 x = x + 1
 p = list[x]
 until p ~= 0
Accordingly, the following adjustments to luo2.lua are made in luo3.lua
to improve performance:
 $ diff luo2.lua luo3.lua
 24c24
 < step = step == 2 and 4 or 2
 ---
 > step = 6 - step
 36,40c36,42
 < while j <= M do
 < S[j] = 0
 < j = j + ij
 < ij = t - ij
 < count = count + 1
 ---
 > if S[i] ~= 0 then
 > while j <= M do
 > S[j] = 0
 > j = j + ij
 > ij = t - ij
 > count = count + 1
 > end
 46,48c48,50
 < for _, v in next, S do
 < if v > 0 then
 < result[#result + 1] = v
 ---
 > for i = 1, #S do
 > if S[i] > 0 then
 > result[#result + 1] = S[i]
 $
Then:
 $ time lua-5.4.3 luo3.lua
 pi(1000000) = 78498, count=429627 (calculated 100 times)
 clockAccum.Init=5.828
 clockAccum.Sieve=9.941
 clockAccum.Pack=2.405
 clockTotal=18.174
 real 0m18.227s
 user 0m16.629s
 sys 0m1.575s
 $
We notice that all sections have been speeded up: The luo3.lua
initialization is still a bit slower than the Eratos3.lua
initialization: Presumably, this is explained by the different loop
contructs used. And the packing is performing similarly in luo3.lua and
Eroatos3.lua and understandably: The packing loops are equivalent.
The sieving section has been speeded up, but there is still some way to
go, which is a bit strange, considering that the luo3.lua sieve loop
 while j <= M do
 S[j] = 0
 j = j + ij
 ij = t - ij
 count = count + 1
 end
is statement for statement equivalent to the Eratos3.lua sieve loop:
 while i <= l0 do
 list[i] = 0
 i = i + d
 d = z - d
 -- c = c + 1
 count = count + 1
 end
The reason turns out to be that the luo.lua variable j is global. Hence,
we declare j local like this in luo4.lua:
 $ diff luo3.lua luo4.lua
 33c33
 < j = c
 ---
 > local j = c
 $
Resulting in:
 $ time lua-5.4.3 luo4.lua
 pi(1000000) = 78498, count=429627 (calculated 100 times)
 clockAccum.Init=6.005
 clockAccum.Sieve=5.818
 clockAccum.Pack=2.513
 clockTotal=14.336
 real 0m14.471s
 user 0m12.667s
 sys 0m1.715s
 $
Overall, this essentially eliminates the difference between the
performance of luo2.lua and Eratos3.lua, except for the initialization,
as noted.
Finally, one detail stands out: The count reported by both luo3.lua and
luo4.lua is one greater than the count reported by Eratos3.lua. Looking
into this, it turns out that the original Xuedong Luo method presupposes
that the N parameter is of the form 3*M+2 with odd M. So the odd M is =
2*k+1 for some k, so N = 3*(2*k+1)+2 = 6*k+5, and the N = 1000000 used
in the test is not of this form. As a result, the inner sieving loop of
luo.lua in rare cases sets a S[j] = 0 outside the intended range,
explaining the difference in the counts.
So, with these initial considerations out of the way, the original plan
was to present the very different module that I mentioned earlier,
Sieve1.lua. However, this mail is already too long, so I'll limit
myself to simply attaching the code (Sieve1_20210507_1322.zip containing
Sieve1.lua and required utility modules Integer1.lua and List1.lua) and
just comment briefly.
To run the code, unpack the files into some directory and:
 $ time lua-5.4.3 -e'local s=require "Sieve1"; for n=1,100 do ps=s.primesTo( 1000000 ) end print( "moduleCount=" .. s.moduleCount .. ", #ps=" .. #ps )'
 moduleCount=3, #ps=78498
 real 0m11.119s
 user 0m10.872s
 sys 0m0.249s
 $
This mimics the luo.lua calculation by calculating the primes to 1000000
a hundred times. (And note: This is slightly faster than both luo4.lua
and Eratos3.lua.)
Sieve1.lua is different from Eratos3.lua in (at least) two significant
ways:
1. The number of primes (called modules) handled specially (like 2 and 3
 in luo.lua and Eratos3.lua) can be varied. This allows easy
 experimentation with this parameter. The module count is hardcoded in
 Sieve1.lua, presently Sieve1.moduleCount = 3, so that the primes
 handled specially are 2, 3, and 5. This particular value of the
 module count gives the best performance for the primes below 1000000
 in that both using 2 and 4 modules is slower. This minimum value may
 change with the number of primes desired, however, so using 3 is
 perhaps not the end of the story.
2. The Sieve1.lua module uses a single bit (with 0 meaning potentially
 prime and 1 denoting composite) to represent an entry in the list of
 numbers on which exclusion is performed. This means that (under
 ordinary circumstances) 64 numbers (bits of a Lua integer) are
 handled in parallel. But beware that for larger primes, there are far
 between the exclusion values, reducing this advantage, perhaps even
 turning it into a disadvantage.
Enough for now. Have fun. I did.
Best
Thorkil Naur

Attachment: Sieve1_20210507_1322.zip
Description: Zip compressed data


AltStyle によって変換されたページ (->オリジナル) /