Sum Square Difference, which way is more Pythonic?

Question 1

I'm using python 3 and I am creating an algorithm to find the sum square difference for the first 100 (or 1 to x) natural numbers. This happens to be project euler problem 6 if anyone is wondering.

I've written it two ways and I am looking for criticism on my writing style to further guide me in how I code algorithms. I am aware there is probably a better "optimized" solution in terms of big(o) but my math skills just haven't reached there yet.

Algorithm 1

def sum_square_difference(max_range):
 #Finds the sum square difference for the first x(max range) natural numbers
 numbers = range(1,max_range+1)
 sum_squares = sum([x**2 for x in numbers])
 square_sum = sum(numbers) ** 2
 return square_sum - sum_squares

I find this algorithm to be the most readable, but something tells me it may be more verbose in terms of lines of code than necessary so I wrote the following algorithm.

Algorithm 2

def sum_square_difference2(max_range):
 numbers = range(1,max_range+1)
 return (sum(numbers) ** 2) - (sum([x**2 for x in numbers]))

This one seems much cleaner, but I find myself struggling more to understand and read what is going on, especially when considering the perspective of an outside observer.

I appreciate any insight.

Question 2

Why not just utilize the identities for the sum of the first N squares and the one for the first N integers? i.e. n * (n + 1) * (2*n + 1) / 6 - n**2 * (n + 1)**2 / 4. But of course, as suggested in the answers, comment your method... You could probably even simplify that some algebraically...

Question 3

I would do something like:

def sum_of_squares(n):
 return sum(i ** 2 for i in range(1, n+1))
def square_of_sum(n):
 return sum(range(1, n+1)) ** 2
def sum_square_difference(n):
 return sum_of_squares(n) - square_of_sum(n)

Notice the use of a generator expression in sum, where I have omitted the square brackets.

Question 4

I find that people often recommend splitting up already small functions into even smaller functions, but personally I find that this can take longer when debugging, as you can't tell at a glance if the function is used in more than one place.

Question 5

@Turksarama With a decent editor (that supports highlighting), certainly you can.

Question 6

Even without a decent editor (I use Atom), a folder-wide search works just fine. Ctrl+Shift+F, "square_of_sum(". Doesn't work for aliasing, but oh well, tests should catch that.

Question 7

A further advantage of this method is that you can easily change the logic inside the smaller functions without any changes in the larger method. E.g. by replacing it with numpy code or an explicit mathematical formula.

Question 8

I find myself struggling more to understand and read what is going on

This is the key insight. A chunk of code is likely to be read more times than it was written. So write your code to be read. Use comments, docstrings, whitespace, type hints, etc. to make it easier for someone unfamiliar with the code (including your future self) to read and understand it.

Question 9

An alternative not discussed in the other answers: use numpy. As soon as you want to do anything serious with numbers, it's going to be useful. The downside is that numpy uses fixed-size integers, which can lead to overflow. It also allocates the array, which uses more memory than necessary.

import numpy as np
def sum_square_difference(n):
 nums = np.arange(1, n+1, dtype=int)
 return nums.sum()**2 - (nums**2).sum()

If you want to input very large numbers, or have access to very little memory, just use the analytical formulas for the sums:

def sum_square_analytical(n):
 sum_squares = n*(n+1)*(2*n+1)//6
 square_sum = (n*(n+1)//2)**2
 return square_sum - sum_squares

Of course, you can add docstrings and comments, as suggested in other answers.

Question 10

If you're going to do the math, you might as well do the subtraction as well, no? Then sum_square_analytical is just return (3*n**4 + 2*n**3 - 3*n**2 - 2*n)//12. Or factor it, which seems faster: return n*(n+1)*(n-1)*(3*n+2)//12

Question 11

@NickMatteo Of course you could simplify it even further, I just wanted to present the way I would have done it. I prefer splitting up the logic, since generally you could have more complicated formulas, and combining them adds an extra layer of debugging for a future developer. I just googled the closed expressions for these sums, and wrote them into the code. That way, it will be easy for any future developer to verify that these formulas are correct.

Question 12

I -1'd this (even though I don't have the reputation to), because I would argue this is not answering the OP's question. He's not asking for the best computational way to do the sum of squares, he's asking for the most readable. numpy is very much unreadable to the initiated. Compare your examples to Nikos Oikou's, from the perspective of someone who knows Python, but has never used numpy.

Question 13

@AdamBarnes It is entirely possible to learn Python without ever touching numpy, but I'd definitely argue that the example I provided is in no way unreadable. Even to someone who has barely touched Python, I'd argue that (nums**2).sum() is more readable than sum(n**2 for n in nums). However, I might be biased from having used numpy extensively. I definitely find Nikos Oikou's answer clear, concise and readable, but I wanted to present alternative approaches that might be usable. And in my own opinion, anyone learning Python should also put some focus on numpy.

Question 14

Docstrings:

def sum_square_difference(max_range):
 #Finds the sum square difference for the first x(max range) natural numbers
 numbers = range(1,max_range+1)
 sum_squares = sum([x**2 for x in numbers])
 square_sum = sum(numbers) ** 2
 return square_sum - sum_squares

When you define a function and include a docstring, it should be contained within triple quotes instead of # that is used for comments.

Like this:

def sum_square_difference(max_range):
""" Explanation goes here """

Regarding the syntax or how you write the code, you might do it in one line instead of the use of many unnecessary variables.

Like this:

def square_difference(upper_bound):
 """Return sum numbers squared - sum of squares in range upper_bound inclusive."""
 return sum(range(1, upper_bound + 1)) ** 2 - sum(x ** 2 for x in range(1, upper_bound + 1))

You might want to check pep8 https://www.python.org/dev/peps/pep-0008/ - the official Python style guide.

Question 15

I don't think I agree with the idea of removing the definition of the range, it is much clearer that the same range is used twice if you assign it to a variable first.

Question 16

If you're referring to the style guide, you should also limit your maximum line length to 79 characters. You can still have it as one expression, split over multiple lines.

Question 17

unnecessary variables can you explain this point? I think that in general code with more usefully-named variables is easier to maintain than long lines of code like yours

Question 18

@user2023861 a variable is unnecessary if we can use its value and directly compute a one time calculation, if the value will be used more than once, then a variable is necessary and from my understanding, the author of the post wanted to minimize the lines of code therefore, I omitted whatever variables I consider to be unnecessary and replaced them with one line comprehension syntax which is faster and shorter.

Question 19

@emadboctor OP says that algorithm2 isn't as good because he struggles to read and understand the code. Algorithm2 is the one with fewer variables. Variables with useful names document themselves, even if they're only used once. That's why it's probably better to have more variables.

Nikos Oikou Nikos Oikou 3611 silver badge4 bronze badges · Accepted Answer · 2019-08-14 22:29:54Z

18

\$\begingroup\$

I would do something like:

def sum_of_squares(n):
 return sum(i ** 2 for i in range(1, n+1))
def square_of_sum(n):
 return sum(range(1, n+1)) ** 2
def sum_square_difference(n):
 return sum_of_squares(n) - square_of_sum(n)

Notice the use of a generator expression in sum, where I have omitted the square brackets.

Share

answered Aug 14, 2019 at 22:29

Nikos Oikou's user avatar

Nikos Oikou Nikos Oikou

3611 silver badge4 bronze badges

\$\endgroup\$

4

7

\$\begingroup\$ I find that people often recommend splitting up already small functions into even smaller functions, but personally I find that this can take longer when debugging, as you can't tell at a glance if the function is used in more than one place. \$\endgroup\$

Turksarama
– Turksarama

2019年08月15日 06:15:47 +00:00
Commented Aug 15, 2019 at 6:15
\$\begingroup\$ @Turksarama With a decent editor (that supports highlighting), certainly you can. \$\endgroup\$

Kenneth K.
– Kenneth K.

2019年08月15日 14:10:18 +00:00
Commented Aug 15, 2019 at 14:10
\$\begingroup\$ Even without a decent editor (I use Atom), a folder-wide search works just fine. Ctrl+Shift+F, "square_of_sum(". Doesn't work for aliasing, but oh well, tests should catch that. \$\endgroup\$

Adam Barnes
– Adam Barnes

2019年08月16日 12:50:13 +00:00
Commented Aug 16, 2019 at 12:50
\$\begingroup\$ A further advantage of this method is that you can easily change the logic inside the smaller functions without any changes in the larger method. E.g. by replacing it with numpy code or an explicit mathematical formula. \$\endgroup\$

Ivo Merchiers
– Ivo Merchiers

2019年11月07日 15:02:37 +00:00
Commented Nov 7, 2019 at 15:02

Add a comment |

Stack Exchange Network

Sum Square Difference, which way is more Pythonic?

4 Answers 4

Docstrings:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sum Square Difference, which way is more Pythonic?

4 Answers 4

Docstrings:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions