I gave problem #22 on Project Euler a try in Python3.
Other than suggested in the instructions, I used requests
to fetch the file to work on.
The code below does produce the correct result but takes no measures to catch exceptions if something breaks, e.g. when the download is not successful.
I managed to split a string, in which names in double quotes are separated by a comma and turn it to a list, but I'm wondering if there isn't any better or more pythonic way.
As far as the summation of int
in a list is concerned, I'm aware that reduce
from functools
might be an alternative, but for my first contribution here, I decided to keep it (more) simple.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''Project Euler - Problem 22 - Name scores
URL: https://projecteuler.net/problem=22
Resource (names.txt): https://projecteuler.net/project/resources/p022_names.txt
Instructions: Using names.txt (right click and 'Save Link/Target As...'),
a 46K text file containing over five-thousand first names, begin by
sorting it into alphabetical order. Then working out the alphabetical value
for each name, multiply this value by its alphabetical position in the list
to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is
worth 3 +たす 15 +たす 12 +たす 9 +たす 14 =わ 53, is the 938th name in the list.
So, COLIN would obtain a score of 938 ×ばつかける 53 =わ 49714.
Assignment: What is the total of all the name scores in the file?
'''
import sys
import requests
URL = 'https://projecteuler.net/project/resources/p022_names.txt'
def fetch_names(url):
'''Get a text file with names and return a sorted list
Parameters
----------
url : str
url of a text file at Project Euler.
Returns
-------
alphabetically sorted list of first names
'''
r = requests.get(url)
# r.text is a string with names
# the names are in capital letters, enclosed in double quotes
# and separated by commas
names = [name.strip('"') for name in r.text.split(',')]
return sorted(names)
def calculate_name_score(pos, name):
'''
Calculate the "name sore" of a name at a specific position in a list
Parameters
----------
pos : int
zero-based index of a name in a list
name : str
name in CAPS
Returns
-------
name_score : int
'''
# letter scores: 'A' = 1, 'B' = 2, etc.!
ch_scores = [ord(ch) - 64 for ch in name]
score = sum(ch_scores)
# Project Euler: index of the names starts with 1!!
name_score = score * (pos + 1)
return name_score
def calculate_total_score(names):
'''Return a "name score" for a list of names
Parameters
----------
names : list (of strings)
Returns
-------
total_score : int
'''
scores = [calculate_name_score(pos, name) for (pos, name) in enumerate(names)]
total_score = sum(scores)
return total_score
def main():
names = fetch_names(URL)
result = calculate_total_score(names)
print(result)
if __name__ == '__main__':
sys.exit(main())
I'm grateful for any improvement, regardless whether it concerns the structuring of code, naming, documentation, or anything you can think of.
-
1\$\begingroup\$ just curious came across Project Euler first time. Who prepares the problems on this site. I mean no info about creator of the site is given in its About page. Just curious. \$\endgroup\$Mahesha999– Mahesha9992015年12月31日 13:57:41 +00:00Commented Dec 31, 2015 at 13:57
3 Answers 3
Code looks pretty good. Nicely commented, well-structured, seems to do what it’s supposed to do. Just a few small suggestions from me:
Don’t use magic numbers. In this line, it’s not immediately obvious what 64 means:
ch_scores = [ord(ch) - 64 for ch in name]
If I think a little, and find that
ord('A') = 65
, I can work out what’s happening here. But it would be even clearer to write:ch_scores = [ord(char) - ord('A') + 1 for char in name]
Cache the results from ord()? You’ll be making a call to
ord()
at least once for every letter in every name; twice if you adopt my strategy. It might be better to precompute the scores in a global dictionary, e.g.:import string CHAR_SCORES = {char: ord(char) - ord('A') + 1 for char in string.ascii_uppercase}
and then just look up scores in this dictionary. It probably won’t make much of a difference for this problem, but it’s something to bear in mind.
Use generator comprehensions unless you need a list. A generator comprehension is like a list comprehension, but instead of precomputing all the elements upfront and storing them in a list, it computes-and-yields one element at a time. That can have big savings in memory.
Both of your list comprehensions get passed straight to
sum()
, which can just get one element at a time – it isn’t jumping around the list.The change is minimal – just replace the square brackets of a list comprehension with parentheses instead, i.e.
scores = (calculate_name_score(pos, name) for (pos, name) in enumerate(names))
-
1\$\begingroup\$ You may go further and don't call ord at all using enumerate(string.ascii_uppercase, start=1) \$\endgroup\$RiaD– RiaD2015年12月31日 16:02:03 +00:00Commented Dec 31, 2015 at 16:02
Your program is neat: concise, readable, following conventions and well documented. Good job. Everything I’ll say, except for sys.exit
is more or less nitpicking or subject to personal taste.
Exiting the program
When the control flow reaches the end of the file, the program exits with a status of 0
. The same occurs when calling sys.exit(0)
or sys.exit(None)
. Since main
does not have a return
statement, it returns None
. Thus the call to exit
is not needed and you can remove it.
The last line should simply be:
main()
Docstrings
The docstring for calculate_name_score
contains a typo ("sore") and should start on the same line than the '''
.
Generator expressions vs. list-comprehensions
You tend to build lists out of list-comprehensions as an intermediate step of an other computation (sorted
, sum
, ...). Since you do not re-use this list, it is more memory-friendly to use a generator expression directly in the last step. For instance:
sum(ord(ch) - 64 for ch in name)
sorted(name.strip('"') for name in r.text.split(','))
Constants
64 is kind of a magic number here. It would have been better to define
A_OFFSET = ord('A') - 1
at the top of the file and used that instead.
Separation of concerns
Given their names, I would have computed only the score for a name in calculate_name_score
and compute it's product with its position in calculate_total_score
.
Something along the lines of:
def calculate_name_score(name):
# letter scores: 'A' = 1, 'B' = 2, etc.!
return sum(ord(ch) - 64 for ch in name)
def calculate_total_score(names):
return sum(calculate_name_score(name) * (pos + 1) for (pos, name) in enumerate(names))
Even better, you can use the second (optional) argument of enumerate
to avoid computing pos + 1
each time. You also don't really need the parenthesis around pos, name
to define the tuple:
def calculate_total_score(names):
return sum(calculate_name_score(name) * pos for pos, name in enumerate(names, 1))
-
\$\begingroup\$ I was totally unaware of the optional argument in
enumerate
. Thanks a lot! \$\endgroup\$Klaus-Dieter Warzecha– Klaus-Dieter Warzecha2015年12月31日 10:56:33 +00:00Commented Dec 31, 2015 at 10:56
Your code looks readable in general.
Avoid useless variables
score = sum(ch_scores)
# Project Euler: index of the names starts with 1!!
name_score = score * (pos + 1)
return name_score
Becomes:
return sum(ch_scores) * (pos + 1)
The single expression is more immediate to understand. The same applies to other parts of the code.
Static typing
I see you like static typing, instead of writing it in the docstring you should take advantage of the new typing
module and function annotation.
-
1\$\begingroup\$ Your remark on useless variables is very interesting! Actually, my return was exactly as you have suggested and I only introduced the additional variable to have something more catchy to refer to in the docstring :D But I totally agree that the additional variable is in fact useless. Am I right that
typing
refers to PEP-0484? That looks interesting, thanks a lot! \$\endgroup\$Klaus-Dieter Warzecha– Klaus-Dieter Warzecha2015年12月31日 09:35:26 +00:00Commented Dec 31, 2015 at 9:35 -
\$\begingroup\$ @KlausWarzecha Yes I meant pep 0484 \$\endgroup\$Caridorc– Caridorc2015年12月31日 09:43:12 +00:00Commented Dec 31, 2015 at 9:43
-
\$\begingroup\$ I disagree, I think adding the extra variables makes it more clear, especially when debugging. \$\endgroup\$reggaeguitar– reggaeguitar2015年12月31日 16:41:49 +00:00Commented Dec 31, 2015 at 16:41
Explore related questions
See similar questions with these tags.