For learning purpouses I wrote a class that is supposed to find all divisors for a given integer. My class makes use of primes and prime factorization to speed things up a bit. What can I improve in my class?
Are there any microoptimiztations that could speed up the process? Any bad practices that should be avoided? I am by no means an expert at math, so feel free to be very nit-picky. :)
DivisorFinder Class
public class DivisorFinder
{
/// <summary>
/// The number that is searched for divisors
/// </summary>
private int _number;
/// <summary>
/// Search new divisors for a given integer
/// </summary>
/// <param name="number">Integer that is searched for divisors</param>
public DivisorFinder(int number)
{
_number = number;
}
/// <summary>
/// Find prime numbers up to a given maximum
/// </summary>
/// <param name="upperLimit">The upper limit</param>
/// <returns>All prime numbers between 2 and upperLimit</returns>
private IEnumerable<int> FindPrimes(int upperLimit)
{
var composite = new BitArray(upperLimit);
var sqrt = (int) Math.Sqrt(upperLimit);
for (int p = 2; p < sqrt; ++p)
{
if (composite[p])
continue;
yield return p;
for (int i = p * p; i < upperLimit; i += p)
composite[i] = true;
}
for (int p = sqrt; p < upperLimit; ++p)
if (!composite[p])
yield return p;
}
/// <summary>
/// Search prime factors in a given IEnumerable of prime numbers
/// </summary>
/// <param name="primesList">The IEnumerable of previously calculated prime numbers</param>
/// <returns>IEnumerable of prime factors</returns>
private IEnumerable<(int, int)> FindPrimeFactors(IEnumerable<int> primesList)
{
foreach (var prime in primesList)
{
if (prime * prime > _number)
break;
int count = 0;
while (_number % prime == 0)
{
_number = _number / prime;
count++;
}
if (count > 0)
yield return ((prime, count));
}
if (_number > 1)
yield return (_number, 1);
}
/// <summary>
/// Find all divisors of the target number
/// </summary>
/// <returns>IEnumerable of integers that divide the target without reminder</returns>
public IEnumerable<int> FindDivisors()
{
var primes = FindPrimes((int) (Math.Sqrt(_number) + 1));
var factors = FindPrimeFactors(primes);
var divisors = new HashSet<int> {1};
foreach (var factor in factors)
{
var set = new HashSet<int>();
for (int i = 0; i < factor.Item2 + 1; i++)
{
foreach (int x in divisors)
{
set.Add((int) Math.Pow(x * factor.Item1, i));
}
}
divisors.UnionWith(set);
}
return divisors;
}
}
Usage
var number = 2147483644;
var divisorFinder = new DivisorFinder(number);
var divisors = divisorFinder.FindDivisors();
Console.WriteLine($"Divisors: [{string.Join(", ", divisors)}]");
Performance for this example: ~6 milliseconds on a i5-6600K @ 3.50GHz
Credits:
The class was inspired by the python example from Bruno Astrolino.
Furthermore the prime sieve function was adapted from Dennis_E.
2 Answers 2
Like mentioned in Peter Taylor's answer you have some bugs in your code.
- The primes aren't calculated properly
- The divisors aren't calculated properly
Let's first dig into the primes because that's the easiest.
In the FindPrimes()
method you are calculating the upper border like
var sqrt = (int) Math.Sqrt(upperLimit);
til which you check whether composite[p] == false
.
This incorrect as it stands; you need to + 1
here like so
var sqrt = (int) Math.Sqrt(upperLimit) + 1;
Another problem comes with big numbers in this loop:
for (int i = p * p; i < upperLimit; i += p)
composite[i] = true;
because if (p * p) > int.MaxValue
it becomes negative and will run for a very loooong time (and maybe never ends). Adding a second loop condition will fix this as well.
If we take into account that all numbers divisible by 2
can't be primes, we can speed this up as well by starting the former loop at 3
and only check each second number. The same trick can be used for the last loop.
Making this a static
method now the FindPrimes()
method looks like so
private static IEnumerable<int> FindPrimes(int upperLimit)
{
var composite = new BitArray(upperLimit);
var sqrt = (int)Math.Sqrt(upperLimit) + 1;
if (sqrt >= 2) { yield return 2; }
for (int i = 4; i < upperLimit && i > 0; i += 2)
{
composite[i] = true;
}
for (int p = 3; p < sqrt; p += 2)
{
if (composite[p]) { continue; }
yield return p;
for (int i = p * p; i < upperLimit && i > 0; i += p)
{
composite[i] = true;
}
}
if (sqrt % 2 == 0)
{
sqrt++;
}
for (int p = sqrt; p < upperLimit; p += 2)
{
if (!composite[p])
{
yield return p;
}
}
}
Now the primes will be calculated properly.
Let us now calculate the divisors
The main problem lies in trying to be clever. Instead of yielding a Tuple<int, int>
let us just yield
the single factors. We can maybe speed this up (if the compiler doesn't optimize it anyway) by calculating the upper border here as well.
Currently you have
foreach (var prime in primesList)
{
if (prime * prime > _number)
break;
to exit the loop. We can avoid the multiplication here if we once calculate the square root of _number
and compare this with prime
like so
private static IEnumerable<int> FindPrimeFactors(IEnumerable<int> primesList, int number)
{
int upperLimit = (int)Math.Sqrt(number) + 1;
foreach (var prime in primesList)
{
if (prime > upperLimit) { break; }
while (number % prime == 0)
{
number = number / prime;
yield return prime;
upperLimit = (int)Math.Sqrt(number) + 1;
}
}
if (number > 1) { yield return number; }
}
I have timed this and yielding single factors isn't slower than yielding a Tuple<int,int
.
The method is now static
as well and takes an additional parameter. After we have calculated only the prime factors we now need to calculate the divisors like so
public static IEnumerable<int> FindDivisors(int number)
{
var primes = FindPrimes(number);
var factors = FindPrimeFactors(primes, number);
var divisors = new HashSet<int> { 1 };
foreach (var factor in factors)
{
var set = new HashSet<int>();
foreach (int x in divisors)
{
set.Add(x * factor);
}
divisors.UnionWith(set);
}
return divisors.OrderBy(d => d);
}
-
\$\begingroup\$ That's not a problem. The bugs are elsewhere. \$\endgroup\$Peter Taylor– Peter Taylor2018年09月11日 11:35:34 +00:00Commented Sep 11, 2018 at 11:35
-
\$\begingroup\$ @Heslacher I am not really sure, but I think the bug is not in the
FindPrimes()
method. I don't know what is causing it either tho. \$\endgroup\$766F6964– 766F69642018年09月11日 23:21:28 +00:00Commented Sep 11, 2018 at 23:21 -
\$\begingroup\$ @PeterTaylor and 766F6964 updated answer \$\endgroup\$Heslacher– Heslacher2018年09月12日 05:56:00 +00:00Commented Sep 12, 2018 at 5:56
-
1\$\begingroup\$ 1. I repeat, calculating the "upper border" twice is not a problem. Once
FindPrimes
is fixed,FindPrimeFactors
only needs to do trial division up to the square root, because it then recognises that whatever's left over is the last prime. 2. Yielding single divisors is less efficient than calculating their multiplicity. The bug is in the line withMath.Pow
, but I think it's a useful exercise for OP to find it. 3. Pre-calculating the square root of_number
can be less efficient. If you're going to do that, you should recalculate it whenever you find a factor. \$\endgroup\$Peter Taylor– Peter Taylor2018年09月12日 06:27:18 +00:00Commented Sep 12, 2018 at 6:27 -
1\$\begingroup\$ It's not the time for yielding: it's the time for processing. Consider an input number of the form \2ドル^n\$. If you yield a tuple,
FindDivisors
will generate the \$n+1\$ powers thereof in \$\Theta(n)\$ time. But yielding the prime \$n\$ times makesFindDivisors
generate the powers in \$\Theta(n^2)\$ time. \$\endgroup\$Peter Taylor– Peter Taylor2018年09月12日 13:48:47 +00:00Commented Sep 12, 2018 at 13:48
Bugs
FindPrimes
is one of the better sieve implementations I've seen posted on this site, but it has an out-by-one bug. Here's some test code, which you can tidy up in a unit testing framework if you want to do things properly:
var df = new DivisorFinder(72);
int[] knownPrimes = new int[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47 };
for (int limit = 1; limit <= 50; limit++)
{
var expected = knownPrimes.Where(p => p < limit);
var observed = df.FindPrimes(limit);
if (!expected.SequenceEqual(observed)) Console.WriteLine($"Error for limit {limit}");
}
FindDivisors
has a more alarming bug. What are the divisors of 72
? According to this code, they include 144
...
Structure
FindPrimes
should be static
: nothing in that method depends on instance fields or properties.
FindPrimeFactors
modifies state, so it's not idempotent. Calling FindDivisors
twice on the same object should give the same results, but it doesn't.
Frankly, this class is an example of OO purism at its worst. There's no good reason for any of the methods to be non-static
.
IMO the dependency relationship between FindDivisors
and FindPrimeFactors
is wrong: FindPrimeFactors
should be responsible for calculating which primes it needs to use. Consider that if you want to have multiple methods which use the prime factors in different ways, you shouldn't have to copy-paste the line
var primes = FindPrimes((int) (Math.Sqrt(_number) + 1));
into every single one.
-
\$\begingroup\$ I thought I got some strange values but didn't dig into it further. If I could I would give you +10 ;-) \$\endgroup\$Heslacher– Heslacher2018年09月11日 08:32:20 +00:00Commented Sep 11, 2018 at 8:32
-
\$\begingroup\$ @PeterTaylor Thanks for your reply. I wasn't aware of this critical bug. I am not entirly sure yet what triggers this behaviour. Apparently the wrong divisor is always twice the original number, so it seems. Regarding your structure suggestions:
FindPrimeFactors
would not have anIEnumerable<int>
with the primes as parameter but instead only the target number which is then passed to aFindPrimes
call from inside theFindPrimeFactors
method? And yes I agree all method can actually be made static. The target number can be provided as parameter to theFindDivisors
method. \$\endgroup\$766F6964– 766F69642018年09月11日 17:49:38 +00:00Commented Sep 11, 2018 at 17:49 -
\$\begingroup\$ @766F6964, that is indeed what I meant about
FindPrimeFactors
. \$\endgroup\$Peter Taylor– Peter Taylor2018年09月11日 21:26:08 +00:00Commented Sep 11, 2018 at 21:26 -
\$\begingroup\$ @PeterTaylor I was thinking if it would be even better to make the PrimeFinder a seperate, also static, class. Mainly Because finding divisors and finding primes are two separate things. \$\endgroup\$766F6964– 766F69642018年09月12日 22:29:44 +00:00Commented Sep 12, 2018 at 22:29
SieveOfErathostenes(...)
due to a minor bug. \$\endgroup\$