Project Euler #42 - Triangle Numbers

Question 1

Here is the Euler problem referenced, it says:

The n\$^{th}\$ term of the sequence of triangle numbers is given by, t\$_n\$ = 1⁄2n(n+1); so the first ten triangle numbers are:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

By converting each letter in a word to a number corresponding to its alphabetical position and adding these values we form a word value. For example, the word value for SKY is 19 +たす 11 +たす 25 =わ 55 = t\$_{10}\$. If the word value is a triangle number then we shall call the word a triangle word.

Using words.txt (right click and 'Save Link/Target As...'), a 16K text file containing nearly two-thousand common English words, how many are triangle words?

My solution is as follows:

import re
import string
#generate a key to map the letters in our strings to
letters = string.ascii_uppercase
letters_map = {letter: idx for idx, letter in enumerate(string.ascii_uppercase, start=1)}
#open the file and parse on quoted values 
with open('words.txt') as f:
 words = sorted(re.findall(r'"(.*?)"', f.read()))
#max word score can only be Z*len(n), recall 'Z' : 26 in letters_map 
max_word_score = len(max(words, key=len))*26
#generate a list of triangle_numbers to search for using max_score as upper bound
n = 1
triangle_numbers = []
while 1/2.0*n*(n+1) <= max_word_score:
 triangle_numbers.append(1/2.0*n*(n+1))
 n += 1
#finally use a nested for loop to return number of triangle numbers in our now summed words
count = 0
for word in words:
 if sum(letters_map[char] for char in word) in triangle_numbers:
 count += 1
%%timeit
100 loops, best of 3: 3.67 ms per loop

I'm curious if there are any ways I can rewrite this with similar or better performance so that it is more pythonic. I'm curious about a few things more specifically:

1) Is it possible to rewrite this better using itertools.ifilter and itertools.imap using letters_map and the filter being the list of triangle_numbers, to greater affect than simply using a nested for loop and an if statement to filter?

2) Is my while construction extremely not pythonic? It felt awkward writing it the way it was and I wasn't sure what a better way of operating under that framework would be (needing to count up to some arbitrary ceiling, in this case max_word_score.)

3) Should I be getting in the habit of wrapping these items in mini-functions in order to make it more readable/replicable? Would that end up slowing runtime down? What sort of effect would that have in general.

4) Any other pythonizations I can use to better effect?

Question 2

I think your while loop can be improved like this max_triangle = 0.5 * max_word_score * (max_word_score + 1) triangle_numbers = [ 0.5*k*(k + 1) for k in range(max_triangle)]

Question 3

Unnecessary floating point

Triangular numbers are always integers. n(n+1) is always even because either n or n+1 is. Thus you don't need to do floating point math here:

n = 1 
triangle_numbers = []
while 1/2.0*n*(n+1) <= max_word_score:
 triangle_numbers.append(1/2.0*n*(n+1))
 n += 1

You can simply write n*(n+1)/2. This drops your running time from 449ms (on 100 iterations) to 423ms.

Unnecessary sorting

You assign words = sorted(...). Nowhere in your code to you need it to be sorted, so that's an easy thing to drop. That gets us down to 405ms.

Checking for triangular numbers

You're building up a list of triangular numbers up to the maximum possible. Logically, this is valid. However, we can do better. A set has much better performance for checking than a list does. Just making that change (declaring triangle_numbers as set() and using add()) gets us down to 321ms.

Generator Expressions

Generator expressions are actually slower than loops (fact), even if they're far easier to read (in my opinion). So when you really care about cycles, you'll want to rewrite your code to be:

count = 0 
for word in words:
 value = 0 
 for char in word:
 value += letters_map[char]
 if value in triangle_numbers:
 count += 1
return count

That gets us down to 222ms.

File Reading

Use csvreader to read in the words instead of re:

words = []
with open('euler42.txt') as f:
 for line in csv.reader(f, delimiter=',', quotechar='"'):
 words.extend(line)

Just using the more specific tool for the job. 172ms.

Question 4

Great catches Barry thanks! I always look forward to reading your responses.

Question 5

Also, and this is a guess a pretty in depth question. But is there any literature/good reading on why loops are faster than generator expressions. And what the difference between using things like xrange() vs. range() yields (no pun intended, ha) you in performance/when you'd use one over the other?

Question 6

@mburke05 In Python2.7, range() gives you a list, e.g. range(10) returns [0,1,2,...,9]. xrange() is a generator that gives you each successive number on the fly, so it's far more efficient since it doesn't have to construct a list up front. In Python3, they fixed range() to do the smart thing.

Question 7

@mburke05 Not sure of a good resource.

Question 8

Your should use csv to read the file.

import csv
with open('words.txt', 'rb') as csvfile:
 words = [
 word
 for line in csv.reader(csvfile, delimiter=',', quotechar='"')
 for word in line
 ]

It's much easier to understand what it is doing. And it's probably safer and faster.

You define letters, however you use string.ascii_uppercase instead.

You can put max_word_score in the same 'block' as the while loop below it. You can also re-arange 1/2.0*n*(n+1) <= max_word_score to get the max n.

Python also has a style guide called PEP8, which says you should have a space either side of operators. 1 + 1 not 1+1. The only exception to this is when you want to show precedence. 1 + 2*3.

You can change your for word in words to one large comprehension, and use sum. For better reliability.

max_word_score = len(max(words, key=len)) * 26
triangle_numbers = [
 0.5 * n * (n + 1)
 for n in xrange(int((2 * max_word_score + max_word_score)**0.5))
]
count = sum(
 1
 for word in words
 if sum(letters_map[char] for char in word) in triangle_numbers
)

If you also assume that you are not going to get the longest English word you can change max_word_score = len(max(words, key=len)) * 26 to max_word_score = 1909 * 26, which will allow the second longest English word. And as they are 'common English words' you could change max_word_score to 27 * 26, which will allow the 9th longest English word.

And if you do use max_word_score = 30 * 26, then you can reduce the memory usage.

You should also change letters_map, max_word_score and triangle_numbers to uppercase variants, e.g. LETTERS_MAP. This is as they are constants.

If I were to do all the above then I would use the following program:

import csv
import string
MAX_TRIANGLE = 30 * 26
LETTERS_MAP = {letter: idx for idx, letter in enumerate(string.ascii_uppercase, start=1)}
TRIANGLE_NUMBERS = [
 0.5 * n * (n + 1)
 for n in xrange(int((2 * MAX_TRIANGLE) ** 0.5))
]
with open('words.txt', 'rb') as csvfile:
 count = sum(
 1
 for line in csv.reader(csvfile, delimiter=',', quotechar='"')
 for word in line
 if sum(LETTERS_MAP[char] for char in word) in TRIANGLE_NUMBERS
 )

If you want it to be faster than your original, then you can change MAX_TRIANGLE = 30 * 26 to MAX_TRIANGLE = 15 * 26, bit dodgy IMO. But this uses less memory, (unless csv loads everything into memory) and is more readable.

To answer your questions:

I would only encorage itertools if you use generators. Which you don't. (Well, you have one that is fed into sum...)
It's not 'Pythonic' as the maths you use isn't. However personally I think a greedy for is better, (hence why my maths is an over estimate).
If you wrapped them in a main that would be good. Wrapping them into functions, is always better than having no functions. But as this can be done with 3 constants, and 1 generator, I would say one function is fine.
I would say following PEP8 is your best bet for that.

Barry Barry 18.6k1 gold badge40 silver badges92 bronze badges · Answer 1 · 2015-10-14 13:59:35Z

Unnecessary floating point

Triangular numbers are always integers. n(n+1) is always even because either n or n+1 is. Thus you don't need to do floating point math here:

n = 1 
triangle_numbers = []
while 1/2.0*n*(n+1) <= max_word_score:
 triangle_numbers.append(1/2.0*n*(n+1))
 n += 1

You can simply write n*(n+1)/2. This drops your running time from 449ms (on 100 iterations) to 423ms.

Unnecessary sorting

You assign words = sorted(...). Nowhere in your code to you need it to be sorted, so that's an easy thing to drop. That gets us down to 405ms.

Checking for triangular numbers

You're building up a list of triangular numbers up to the maximum possible. Logically, this is valid. However, we can do better. A set has much better performance for checking than a list does. Just making that change (declaring triangle_numbers as set() and using add()) gets us down to 321ms.

Generator Expressions

Generator expressions are actually slower than loops (fact), even if they're far easier to read (in my opinion). So when you really care about cycles, you'll want to rewrite your code to be:

count = 0 
for word in words:
 value = 0 
 for char in word:
 value += letters_map[char]
 if value in triangle_numbers:
 count += 1
return count

That gets us down to 222ms.

File Reading

Use csvreader to read in the words instead of re:

words = []
with open('euler42.txt') as f:
 for line in csv.reader(f, delimiter=',', quotechar='"'):
 words.extend(line)

Just using the more specific tool for the job. 172ms.

Great catches Barry thanks! I always look forward to reading your responses.
Also, and this is a guess a pretty in depth question. But is there any literature/good reading on why loops are faster than generator expressions. And what the difference between using things like xrange() vs. range() yields (no pun intended, ha) you in performance/when you'd use one over the other?
@mburke05 In Python2.7, range() gives you a list, e.g. range(10) returns [0,1,2,...,9]. xrange() is a generator that gives you each successive number on the fly, so it's far more efficient since it doesn't have to construct a list up front. In Python3, they fixed range() to do the smart thing.

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Answer 2 · 2015-10-14 13:52:44Z

Your should use csv to read the file.

import csv
with open('words.txt', 'rb') as csvfile:
 words = [
 word
 for line in csv.reader(csvfile, delimiter=',', quotechar='"')
 for word in line
 ]

It's much easier to understand what it is doing. And it's probably safer and faster.

You define letters, however you use string.ascii_uppercase instead.

You can put max_word_score in the same 'block' as the while loop below it. You can also re-arange 1/2.0*n*(n+1) <= max_word_score to get the max n.

Python also has a style guide called PEP8, which says you should have a space either side of operators. 1 + 1 not 1+1. The only exception to this is when you want to show precedence. 1 + 2*3.

You can change your for word in words to one large comprehension, and use sum. For better reliability.

max_word_score = len(max(words, key=len)) * 26
triangle_numbers = [
 0.5 * n * (n + 1)
 for n in xrange(int((2 * max_word_score + max_word_score)**0.5))
]
count = sum(
 1
 for word in words
 if sum(letters_map[char] for char in word) in triangle_numbers
)

If you also assume that you are not going to get the longest English word you can change max_word_score = len(max(words, key=len)) * 26 to max_word_score = 1909 * 26, which will allow the second longest English word. And as they are 'common English words' you could change max_word_score to 27 * 26, which will allow the 9th longest English word.

And if you do use max_word_score = 30 * 26, then you can reduce the memory usage.

You should also change letters_map, max_word_score and triangle_numbers to uppercase variants, e.g. LETTERS_MAP. This is as they are constants.

If I were to do all the above then I would use the following program:

import csv
import string
MAX_TRIANGLE = 30 * 26
LETTERS_MAP = {letter: idx for idx, letter in enumerate(string.ascii_uppercase, start=1)}
TRIANGLE_NUMBERS = [
 0.5 * n * (n + 1)
 for n in xrange(int((2 * MAX_TRIANGLE) ** 0.5))
]
with open('words.txt', 'rb') as csvfile:
 count = sum(
 1
 for line in csv.reader(csvfile, delimiter=',', quotechar='"')
 for word in line
 if sum(LETTERS_MAP[char] for char in word) in TRIANGLE_NUMBERS
 )

If you want it to be faster than your original, then you can change MAX_TRIANGLE = 30 * 26 to MAX_TRIANGLE = 15 * 26, bit dodgy IMO. But this uses less memory, (unless csv loads everything into memory) and is more readable.

To answer your questions:

I would only encorage itertools if you use generators. Which you don't. (Well, you have one that is fed into sum...)
It's not 'Pythonic' as the maths you use isn't. However personally I think a greedy for is better, (hence why my maths is an over estimate).
If you wrapped them in a main that would be good. Wrapping them into functions, is always better than having no functions. But as this can be done with 3 constants, and 1 generator, I would say one function is fine.
I would say following PEP8 is your best bet for that.

Stack Exchange Network

Project Euler #42 - Triangle Numbers

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Project Euler #42 - Triangle Numbers

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions