6
\$\begingroup\$

I have written this small program in Python to calculate decimal places of \$\pi\$. I tried to code it as readably as possible, but since this is a bit calculation heavy, it is reasonable to think about performance optimizations.

The program calculates the first root of the cosine function using the Newton–Raphson method and computes sine and cosine with the Taylor series. Since the first root of the cosine function is at \$\frac{\pi}{2}\$, the result is doubled before it is printed.

# coding: utf-8
"""
calculate decimal places of pi
"""
from decimal import Decimal, getcontext
from itertools import count
c = getcontext()
c.prec = 1000
def is_close(a, b):
 return abs(a - b).adjusted() + c.prec < 2
def newton(x):
 sums = [Decimal(0) for i in range(4)]
 value = Decimal(1)
 sums[0] += value
 for i in count(1):
 value *= x / i
 tmp = sums[i % 4] + value
 if tmp == sums[i % 4]:
 break
 sums[i % 4] = tmp
 cos_x = sums[0] - sums[2]
 sin_x = sums[1] - sums[3]
 return x + cos_x / sin_x
def main():
 x = Decimal(1)
 while True:
 tmp = newton(x)
 if is_close(tmp, x):
 x = tmp
 break
 x = tmp
 print(x * 2)
if __name__ == '__main__':
 main()
mdfst13
22.4k6 gold badges34 silver badges70 bronze badges
asked Sep 8, 2021 at 9:45
\$\endgroup\$

1 Answer 1

5
\$\begingroup\$

Thanks @Aemyl for your post. From a quick check of your code, these initial issues jump out at me:

  • Using __name__ entry point as a runner for main() function
  • Global statements (c = getcontext()) outside of __main__
  • sums = [Decimal(0) for i in range(4)] - unused variable i
  • Function is_close naming. Is close to what?
  • Using break (twice) instead of a flag variable explaining why you're ending the loop. in main and in newton

Let's go over them, and then look at some performance tuning.

Entry point as runner

I see this a lot from people who come from C, but __main__ IS your main already. You should place the content of main() into the entry point where it belongs. This is where other programmers will come to understand how your code runs and why it's doing what it's doing. i.e.:

if __name__ == '__main__':
 x = Decimal(1)
 while True:
 tmp = newton(x)
 ....

Global statements

c = getcontext()
c.prec = 1000

Understanding the import system of Python is important if you want to improve your Python skills. In this case, the modification of the Decimal can be done inside __main__, so we now have:

if __name__ == '__main__':
 c = getcontext()
 c.prec = 1000
 x = Decimal(1)
 while True:
 ....

Unused variable i

The statement sums = [Decimal(0) for i in range(4)] doesn't do anything with i. When I look at that line in my debugger, I can see that it's just populating sums with 4 entries of Decimal(0). Let's look at it from the performance checker's view:

Line # Hits Time Per Hit % Time Line Contents
==============================================================
 19 8 84.0 10.5 0.0 sums = [Decimal(0) for i in range(4)]

And now let's convert the loop into a static statement and re-run that to see if it's improved performance:

Line # Hits Time Per Hit % Time Line Contents
============================================================== 
 21 8 55.0 6.9 0.0 sums = [Decimal(0),Decimal(0),Decimal(0),Decimal(0)]

That's a little better...

Variable naming

When you're writing code, it's not about you "now" - it's about the "future you", as well as - if you do it professionally - the other coders who will look at your code. Think about how many times you've come back to your code after a few weeks or months, and you ask yourself "who wrote this rubbish?" and "what the hell was I thinking?" - I know I've had that experience quite a few times.

Variable naming is an important part of explaining what your code is trying to do.

So, as an example of throwaway variables, one of the things Python allows you to do, is use _ as a variable for these throwaway variables. For example, this code:

for i in count(1):
 value *= x / i
 tmp = sums[i % 4] + value
 if tmp == sums[i % 4]:
 break
 sums[i % 4] = tmp
 

Let's replace all the tmp variables with _:

for i in count(1):
 value *= x / i
 _ = sums[i % 4] + value
 if _ == sums[i % 4]:
 break
 sums[i % 4] = _
 

Looks better right? Actually no, it doesn't. So tmp really isn't a throwaway variable right? It means something. We need to name the variables to explain what they're doing. Perhaps calc_val or quickly looking at some notes about the Newton method, it looks like it's calculating area. So perhaps area is better? If I'm wrong, sorry, it was a very quick look.

Using Break

Looking at the main, we see break being used to escape the loop. This approach results in spaghetti code. You should cleanly exit any loop using a control variable or refactor it into a function with a return statement.

Original code is:

while True:
 tmp = newton(x)
 if is_close(tmp, x):
 x = tmp
 break
 x = tmp
print(x * 2)

Let's introduce a control variable and complete the if into an if/else:

calc_pi = True
while calc_pi:
 tmp = newton(x)
 if is_close(tmp, x):
 x = tmp
 calc_pi = False
 else:
 x = tmp
print(x * 2)
 

Now we're not using break, the loop will exit cleanly into the print statement.

Ah! But what is this? What do we see?

It's now clear that both paths of the if statement perform the same action. Is this an error? Let's make sure the output is the same, then make a change.

result = x * 2
print(result)
if str(result) != "3.14159265....0214":
 print("wrong, didn't match")

And write the changes:

while calc_pi:
 tmp = newton(x)
 if is_close(tmp, x):
 calc_pi = False
 x = tmp

And run the code - yes, the result matches. So this proves the paths were equal and we were correct to refactor that code into that final version (above).

We can also perform a similar change to the newton(x) code, but this is getting a little long, and you asked for performance enhancements.

Performance Checking

Running line_profiler (there's lots of examples how on the 'net), we have the following results:

Line # Hits Time Per Hit % Time Line Contents
==============================================================
 19 @profile
 20 def newton(x):
 21 8 55.0 6.9 0.0 sums = [Decimal(0),Decimal(0),Decimal(0),Decimal(0)]
 22 8 15.0 1.9 0.0 value = Decimal(1)
 23 8 24.0 3.0 0.0 sums[0] += value
 24 3857 3140.0 0.8 1.2 for i in count(1):
 25 3857 233157.0 60.5 86.2 value *= x / i
 26 3857 13665.0 3.5 5.0 tmp = sums[i % 4] + value
 27 3857 15645.0 4.1 5.8 if tmp == sums[i % 4]:
 28 8 9.0 1.1 0.0 break
 29 3849 4438.0 1.2 1.6 sums[i % 4] = tmp
 30 8 21.0 2.6 0.0 cos_x = sums[0] - sums[2]
 31 8 19.0 2.4 0.0 sin_x = sums[1] - sums[3]
 32 8 416.0 52.0 0.2 return x + cos_x / sin_x

Well, value and sums creations still don't look right. Line 23 is doing another static assignment which we can roll into line 21. So, let's change sums[0] into Decimal(1) and remove line 23.

Line 25 seems to be doing most of the work, but I can see that you're continually recalculating modulo. Let's create a variable storing the modulo result and use that instead.

Line # Hits Time Per Hit % Time Line Contents
==============================================================
 14 @profile
 15 def newton(x):
 16 8 33.0 4.1 0.0 sums = [Decimal(1), Decimal(0), Decimal(0), Decimal(0)]
 17 8 9.0 1.1 0.0 value = Decimal(1)
 18 3857 2314.0 0.6 1.3 for i in count(1):
 19 3857 152870.0 39.6 87.9 value *= x / i
 20 3857 3353.0 0.9 1.9 mod_val = i % 4
 21 3857 8470.0 2.2 4.9 tmp = sums[mod_val] + value
 22 3857 3670.0 1.0 2.1 if tmp == sums[mod_val]:
 23 8 4.0 0.5 0.0 break
 24 3849 2961.0 0.8 1.7 sums[mod_val] = tmp
 25 8 13.0 1.6 0.0 cos_x = sums[0] - sums[2]
 26 8 11.0 1.4 0.0 sin_x = sums[1] - sums[3]
 27 8 274.0 34.2 0.2 return x + cos_x / sin_x

So changes thus far have made it run about 60% better (270604 -> 173982). And changing the for loop into a while loop with a control variable:

Line # Hits Time Per Hit % Time Line Contents
==============================================================
 14 @profile
 15 def newton(x):
 16 8 31.0 3.9 0.0 sums = [Decimal(1), Decimal(0), Decimal(0), Decimal(0)]
 17 8 9.0 1.1 0.0 value = Decimal(1)
 18 8 6.0 0.8 0.0 do_calcs = True
 19 8 3.0 0.4 0.0 counter = 1
 20 3865 2430.0 0.6 1.2 while do_calcs:
 21 3857 173534.0 45.0 87.4 value *= x / counter
 22 3857 3466.0 0.9 1.7 mod_val = counter % 4
 23 3857 8579.0 2.2 4.3 tmp = sums[mod_val] + value
 24 3857 3994.0 1.0 2.0 if tmp == sums[mod_val]:
 25 8 6.0 0.8 0.0 do_calcs = False
 26 else:
 27 3849 3214.0 0.8 1.6 sums[mod_val] = tmp
 28 3849 2973.0 0.8 1.5 counter += 1
 29 8 16.0 2.0 0.0 cos_x = sums[0] - sums[2]
 30 8 13.0 1.6 0.0 sin_x = sums[1] - sums[3]
 31 8 283.0 35.4 0.1 return x + cos_x / sin_x

But as you can see, that slowed it down a little (198274; slower by 24292) although the code is much clearer, and we can now refactor that into a function - because the break has been removed (I use PyCharm's Refactor->Extract->Method approach).

Also, we unwrap the *= into a full calculation, and reintroduce your original for loop via count(1), primarily because we've wrapped it into a function so it can exit cleanly with a return statement:

Total time: 0.191908 s
File: 0908_calc_pi.py
Function: newton at line 14
Line # Hits Time Per Hit % Time Line Contents
==============================================================
 14 @profile
 15 def newton(x):
 16 8 35.0 4.4 0.0 sums = [Decimal(1), Decimal(0), Decimal(0), Decimal(0)]
 17 8 9.0 1.1 0.0 value = Decimal(1)
 18 8 191565.0 23945.6 99.8 sums = perform_calcs(sums, value, x)
 19 8 16.0 2.0 0.0 cos_x = sums[0] - sums[2]
 20 8 11.0 1.4 0.0 sin_x = sums[1] - sums[3]
 21 8 272.0 34.0 0.1 return x + cos_x / sin_x
Total time: 0.18069 s
File: 0908_calc_pi.py
Function: perform_calcs at line 23
Line # Hits Time Per Hit % Time Line Contents
==============================================================
 23 @profile
 24 def perform_calcs(sums, value, x):
 25 3857 2152.0 0.6 1.2 for counter in count(1):
 26 3857 162137.0 42.0 89.7 value = value * (x / counter)
 27 3857 2765.0 0.7 1.5 mod_val = counter % 4
 28 3857 7591.0 2.0 4.2 tmp = sums[mod_val] + value
 29 3857 3497.0 0.9 1.9 if tmp == sums[mod_val]:
 30 8 8.0 1.0 0.0 return sums
 31 else:
 32 3849 2540.0 0.7 1.4 sums[mod_val] = tmp

Giving us a value of 191908. That's a little slower than the best, but it's cleaner and refactored (variable naming needs some work still). That's about the best I can think of off the top of my head. I hope this helps!

answered Sep 8, 2021 at 13:23
\$\endgroup\$
13
  • 8
    \$\begingroup\$ If there shouldn't be a main function, then why does the official documentation for __main__ show doing exactly that? \$\endgroup\$ Commented Sep 8, 2021 at 17:26
  • 4
    \$\begingroup\$ @don'ttalkjustcode in the python standard library, both versions appear: aifc.py runs the main routine directly in if __name__ == '__main__':, while ast.py uses a main function. I think I should ask myself how likely it is for another script to import the main function and how likely it is to have a bug because the variables I use inside if __name__ == '__main__': are in the global name scope. I personally prefer using a main function but I wouldn't be surprised if there are good reasons against it. \$\endgroup\$ Commented Sep 8, 2021 at 17:56
  • 5
    \$\begingroup\$ @Aemyl Yeah I find it odd to say one "should" pollute globals. \$\endgroup\$ Commented Sep 8, 2021 at 18:12
  • 5
    \$\begingroup\$ Actually the __main__ documentation just got rewritten and expanded for the upcoming Python 3.10, including an "Idiomatic Usage" section which says "Most often, a function named main encapsulates the program’s primary behavior". And Guido van Rossum called that rewrite "great" and explicitly advocated using a main function like done here. \$\endgroup\$ Commented Sep 9, 2021 at 3:40
  • 3
    \$\begingroup\$ The entry point as runner section is plain wrong. Having a main routine is not redundant. \$\endgroup\$ Commented Sep 9, 2021 at 3:54

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.