Project Euler #19 asks:
How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?
I'm hoping I wasn't too off course from the spirit of the exercise by making use of my favorite libraries: pandas
.
import pandas as pd
rng = pd.date_range(start='1/1/1901', end='12/31/2000', freq='D')
count = 0
for date in rng:
if date.day == 1 and date.weekday() == 6:
count += 1
count
It's a rather simple problem and a rather simple solution. I just made use of the date_range
function that pandas has builtin that works well with datetime
objects in python.
While I don't think my usual questions apply. Some specific questions:
Is it pythonic to run the for loop as such, or would a list comprehension surrounded by a len be more pythonic, e.g.
len([x for x in rng if date.day == 1 and date.weekday() == 6])
? Or is something entirely else even more pythonic?Is there a way to avoid iterating over an object as large as
rng
is with it's 30,000+ items to avoid memory usage? If so what would be a preferred method (just pseudo-code or however you prefer to explain.)As my attention has been brought to how powerful
itertools
is when improving performance and substituting lists for generators, I'm wondering how I would improve upon the below code withitertools
if there is any such a way.
3 Answers 3
It is good to use a library rather than re-inventing everything yourself. Just be sure to avoid explicit looping in Python:
sum(date.day == 1 and date.weekday() == 6 for date in rng)
The above sums the number of times that date.day == 1 and date.weekday() == 6
automatically, with no loops of counters, It should also be more efficient (sum
is implemented in C)
-
1\$\begingroup\$ This is really clever thanks! I hadn't thought to use the
and
clause like that in a loop. Also good to know regardingsum()
\$\endgroup\$mburke05– mburke052015年09月30日 18:56:48 +00:00Commented Sep 30, 2015 at 18:56 -
\$\begingroup\$ @mburke05 Note that this works because
True
can be coerced to 1, andFalse
can be coerced to 0. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月01日 08:36:43 +00:00Commented Oct 1, 2015 at 8:36 -
\$\begingroup\$ @SuperBiasedMan thanks! I did indeed work that out through some trial and error in IDLE after having read his solution. Are there any other cases like that worth knowing? I've seen people throw around things like
while None
orwhile a
where a is an empty list and things like that that I wouldn't normally associate with a boolean identity. \$\endgroup\$mburke05– mburke052015年10月01日 13:31:28 +00:00Commented Oct 1, 2015 at 13:31 -
1\$\begingroup\$ @mburke05 That's called "truthiness" in Python. Regardless of type, variables can be evaluated as booleans. There's a write up of it here \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月01日 13:32:57 +00:00Commented Oct 1, 2015 at 13:32
Don't use rng
to mean range
. I know you were avoiding the builtin function, but it's not clear. It also could be mistaken to mean Random Number Generator (though the usage makes it clear that's not the case). Even if you could use range
, I would advise against it. range
doesn't really describe what the list contains, just the type of list it is. Instead, name things more for their purpose and contents than meta information about the contents. Looking at the for
loop, there's an obvious name to use, dates
.
Also ending with just count
by itself, not even being printed, is odd. It may come out right in the IDE you use but it often will just pass by unnoticed. Call print
and ideally add text around it
print("There are {} first day of the month Sundays in the 20th century.".format(count))
str.format
is the accepted way to format variables into strings. It will call the str()
method on anything passed to it to coerce them to strings. It's preferable to using "string" + str(var) + "string"
because it's clearer and can be shorter even if it might not seem like it from this example, but imagine this:
"My string has two variables, " + str(var) " and the other one is " + str(var)
Versus
"My string has two variables, {} and the other one is {}".format(var, var)
But also str.format
has a lot of useful syntax to help format strings better. I wont go into them here as you don't need them for this script, but take a look over here.
You asked about join
, but that has a fundamentally different purpose. join
takes a collection parameter, like a list, and will make a string by concatenating the list together. It doesn't perform any string conversion and doesn't allow you to insert arbitrary text between values. I definitely wouldn't use it in this case.
-
\$\begingroup\$ Thanks will do, I used
rng
simply because I recall it being in an example in the pandas documentation. Also, thecount
statement was simply due to my IDE as you noted (I work mostly in Jupyter, is this something I shouldn't be doing? I've seen debate on this and that between IDLE vs. PyCharm vs. iPython, etc.) Random question, could you explain the choice to use .format and not .join or even just"Your str"
+str(var)
wherevar
is our number (count in this case.) Thanks! \$\endgroup\$mburke05– mburke052015年10月01日 13:29:21 +00:00Commented Oct 1, 2015 at 13:29 -
\$\begingroup\$ Whatever IDE you want to use is fine, but when I run this (just in IDLE) it gives a very different result, no output at all. Just placing a variable by itself isn't a trustworthy or reliable means to do anything. I'll edit my answer to explain a bit about
format
. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月01日 13:35:11 +00:00Commented Oct 1, 2015 at 13:35
- Don't iterate the dates. Instead use the vectorized
DatetimeIndex
methods. - Don't use
freq='D'
, which generates all dates in the century (~36,000). Instead usefreq='MS'
, which filters down to the "Month Start" dates (~1,200). This means we search ~30x fewer dates and can also remove the check forday == 1
. - Don't use
weekday
codes (0...6). Instead useday_name
strings (Monday...Sunday), which are easier to interpret/maintain.
>>> day_ones = pd.date_range(start='1/1/1901', end='12/31/2000', freq='MS')
>>> day_ones.where(day_ones.day_name() == 'Sunday').notnull().sum()
# 171
In other words, we keep the start-of-month dates where
the day_name
is Sunday and then sum
the notnull
matches.
This approach is significantly faster:
timing plot showing it's fastest to combine freq='D' and Index.where()
Explore related questions
See similar questions with these tags.
I'm hoping I wasn't too off course from the spirit of the exercise by making use of my favorite libraries
The spirit of Project Euler is to learn. You know have learned how to do it. Next step is to learn whether this is the most Pythonic way to do it. You did not go against any spirit. \$\endgroup\$freq
such that rng only contains sundays or first of months. (see stackoverflow.com/questions/13445174/date-ranges-in-pandas ) \$\endgroup\$