I'm using python 3 and I am creating an algorithm to find the sum square difference for the first 100 (or 1 to x) natural numbers. This happens to be project euler problem 6 if anyone is wondering.
I've written it two ways and I am looking for criticism on my writing style to further guide me in how I code algorithms. I am aware there is probably a better "optimized" solution in terms of big(o) but my math skills just haven't reached there yet.
Algorithm 1
def sum_square_difference(max_range):
#Finds the sum square difference for the first x(max range) natural numbers
numbers = range(1,max_range+1)
sum_squares = sum([x**2 for x in numbers])
square_sum = sum(numbers) ** 2
return square_sum - sum_squares
I find this algorithm to be the most readable, but something tells me it may be more verbose in terms of lines of code than necessary so I wrote the following algorithm.
Algorithm 2
def sum_square_difference2(max_range):
numbers = range(1,max_range+1)
return (sum(numbers) ** 2) - (sum([x**2 for x in numbers]))
This one seems much cleaner, but I find myself struggling more to understand and read what is going on, especially when considering the perspective of an outside observer.
I appreciate any insight.
4 Answers 4
I would do something like:
def sum_of_squares(n):
return sum(i ** 2 for i in range(1, n+1))
def square_of_sum(n):
return sum(range(1, n+1)) ** 2
def sum_square_difference(n):
return sum_of_squares(n) - square_of_sum(n)
Notice the use of a generator expression in sum, where I have omitted the square brackets.
-
7\$\begingroup\$ I find that people often recommend splitting up already small functions into even smaller functions, but personally I find that this can take longer when debugging, as you can't tell at a glance if the function is used in more than one place. \$\endgroup\$Turksarama– Turksarama2019年08月15日 06:15:47 +00:00Commented Aug 15, 2019 at 6:15
-
\$\begingroup\$ @Turksarama With a decent editor (that supports highlighting), certainly you can. \$\endgroup\$Kenneth K.– Kenneth K.2019年08月15日 14:10:18 +00:00Commented Aug 15, 2019 at 14:10
-
\$\begingroup\$ Even without a decent editor (I use Atom), a folder-wide search works just fine. Ctrl+Shift+F, "square_of_sum(". Doesn't work for aliasing, but oh well, tests should catch that. \$\endgroup\$Adam Barnes– Adam Barnes2019年08月16日 12:50:13 +00:00Commented Aug 16, 2019 at 12:50
-
\$\begingroup\$ A further advantage of this method is that you can easily change the logic inside the smaller functions without any changes in the larger method. E.g. by replacing it with numpy code or an explicit mathematical formula. \$\endgroup\$Ivo Merchiers– Ivo Merchiers2019年11月07日 15:02:37 +00:00Commented Nov 7, 2019 at 15:02
I find myself struggling more to understand and read what is going on
This is the key insight. A chunk of code is likely to be read more times than it was written. So write your code to be read. Use comments, docstrings, whitespace, type hints, etc. to make it easier for someone unfamiliar with the code (including your future self) to read and understand it.
An alternative not discussed in the other answers: use numpy
. As soon as you want to do anything serious with numbers, it's going to be useful. The downside is that numpy
uses fixed-size integers, which can lead to overflow. It also allocates the array, which uses more memory than necessary.
import numpy as np
def sum_square_difference(n):
nums = np.arange(1, n+1, dtype=int)
return nums.sum()**2 - (nums**2).sum()
If you want to input very large numbers, or have access to very little memory, just use the analytical formulas for the sums:
def sum_square_analytical(n):
sum_squares = n*(n+1)*(2*n+1)//6
square_sum = (n*(n+1)//2)**2
return square_sum - sum_squares
Of course, you can add docstrings and comments, as suggested in other answers.
-
2\$\begingroup\$ If you're going to do the math, you might as well do the subtraction as well, no? Then
sum_square_analytical
is justreturn (3*n**4 + 2*n**3 - 3*n**2 - 2*n)//12
. Or factor it, which seems faster:return n*(n+1)*(n-1)*(3*n+2)//12
\$\endgroup\$Nick Matteo– Nick Matteo2019年08月16日 06:51:44 +00:00Commented Aug 16, 2019 at 6:51 -
\$\begingroup\$ @NickMatteo Of course you could simplify it even further, I just wanted to present the way I would have done it. I prefer splitting up the logic, since generally you could have more complicated formulas, and combining them adds an extra layer of debugging for a future developer. I just googled the closed expressions for these sums, and wrote them into the code. That way, it will be easy for any future developer to verify that these formulas are correct. \$\endgroup\$maxb– maxb2019年08月16日 07:36:52 +00:00Commented Aug 16, 2019 at 7:36
-
\$\begingroup\$ I -1'd this (even though I don't have the reputation to), because I would argue this is not answering the OP's question. He's not asking for the best computational way to do the sum of squares, he's asking for the most readable.
numpy
is very much unreadable to the initiated. Compare your examples to Nikos Oikou's, from the perspective of someone who knows Python, but has never usednumpy
. \$\endgroup\$Adam Barnes– Adam Barnes2019年08月16日 12:52:47 +00:00Commented Aug 16, 2019 at 12:52 -
\$\begingroup\$ @AdamBarnes It is entirely possible to learn Python without ever touching
numpy
, but I'd definitely argue that the example I provided is in no way unreadable. Even to someone who has barely touched Python, I'd argue that(nums**2).sum()
is more readable thansum(n**2 for n in nums)
. However, I might be biased from having usednumpy
extensively. I definitely find Nikos Oikou's answer clear, concise and readable, but I wanted to present alternative approaches that might be usable. And in my own opinion, anyone learning Python should also put some focus onnumpy
. \$\endgroup\$maxb– maxb2019年08月16日 13:04:19 +00:00Commented Aug 16, 2019 at 13:04
Docstrings:
def sum_square_difference(max_range):
#Finds the sum square difference for the first x(max range) natural numbers
numbers = range(1,max_range+1)
sum_squares = sum([x**2 for x in numbers])
square_sum = sum(numbers) ** 2
return square_sum - sum_squares
When you define a function and include a docstring, it should be contained within triple quotes instead of # that is used for comments.
Like this:
def sum_square_difference(max_range):
""" Explanation goes here """
Regarding the syntax or how you write the code, you might do it in one line instead of the use of many unnecessary variables.
Like this:
def square_difference(upper_bound):
"""Return sum numbers squared - sum of squares in range upper_bound inclusive."""
return sum(range(1, upper_bound + 1)) ** 2 - sum(x ** 2 for x in range(1, upper_bound + 1))
You might want to check pep8 https://www.python.org/dev/peps/pep-0008/ - the official Python style guide.
-
3\$\begingroup\$ I don't think I agree with the idea of removing the definition of the range, it is much clearer that the same range is used twice if you assign it to a variable first. \$\endgroup\$Turksarama– Turksarama2019年08月15日 06:13:00 +00:00Commented Aug 15, 2019 at 6:13
-
\$\begingroup\$ If you're referring to the style guide, you should also limit your maximum line length to 79 characters. You can still have it as one expression, split over multiple lines. \$\endgroup\$maxb– maxb2019年08月15日 07:24:51 +00:00Commented Aug 15, 2019 at 7:24
-
\$\begingroup\$
unnecessary variables
can you explain this point? I think that in general code with more usefully-named variables is easier to maintain than long lines of code like yours \$\endgroup\$user2023861– user20238612019年08月15日 16:13:13 +00:00Commented Aug 15, 2019 at 16:13 -
\$\begingroup\$ @user2023861 a variable is unnecessary if we can use its value and directly compute a one time calculation, if the value will be used more than once, then a variable is necessary and from my understanding, the author of the post wanted to minimize the lines of code therefore, I omitted whatever variables I consider to be unnecessary and replaced them with one line comprehension syntax which is faster and shorter. \$\endgroup\$user203258– user2032582019年08月15日 16:22:04 +00:00Commented Aug 15, 2019 at 16:22
-
1\$\begingroup\$ @emadboctor OP says that algorithm2 isn't as good because he struggles to read and understand the code. Algorithm2 is the one with fewer variables. Variables with useful names document themselves, even if they're only used once. That's why it's probably better to have more variables. \$\endgroup\$user2023861– user20238612019年08月15日 18:18:22 +00:00Commented Aug 15, 2019 at 18:18
Explore related questions
See similar questions with these tags.
n * (n + 1) * (2*n + 1) / 6 - n**2 * (n + 1)**2 / 4
. But of course, as suggested in the answers, comment your method... You could probably even simplify that some algebraically... \$\endgroup\$