Project Euler #19: Counting Sundays in the 20th century using Pandas

Question 1

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

I'm hoping I wasn't too off course from the spirit of the exercise by making use of my favorite libraries: pandas.

import pandas as pd
rng = pd.date_range(start='1/1/1901', end='12/31/2000', freq='D')
count = 0
for date in rng:
 if date.day == 1 and date.weekday() == 6:
 count += 1
count

It's a rather simple problem and a rather simple solution. I just made use of the date_range function that pandas has builtin that works well with datetime objects in python.

While I don't think my usual questions apply. Some specific questions:

Is it pythonic to run the for loop as such, or would a list comprehension surrounded by a len be more pythonic, e.g. len([x for x in rng if date.day == 1 and date.weekday() == 6])? Or is something entirely else even more pythonic?
Is there a way to avoid iterating over an object as large as rng is with it's 30,000+ items to avoid memory usage? If so what would be a preferred method (just pseudo-code or however you prefer to explain.)
As my attention has been brought to how powerful itertools is when improving performance and substituting lists for generators, I'm wondering how I would improve upon the below code with itertools if there is any such a way.

Question 2

I'm hoping I wasn't too off course from the spirit of the exercise by making use of my favorite libraries The spirit of Project Euler is to learn. You know have learned how to do it. Next step is to learn whether this is the most Pythonic way to do it. You did not go against any spirit.

Question 3

Just a small note: you could set a freq such that rng only contains sundays or first of months. (see stackoverflow.com/questions/13445174/date-ranges-in-pandas )

Question 4

It is good to use a library rather than re-inventing everything yourself. Just be sure to avoid explicit looping in Python:

sum(date.day == 1 and date.weekday() == 6 for date in rng)

The above sums the number of times that date.day == 1 and date.weekday() == 6 automatically, with no loops of counters, It should also be more efficient (sum is implemented in C)

Question 5

This is really clever thanks! I hadn't thought to use the and clause like that in a loop. Also good to know regarding sum()

Question 6

@mburke05 Note that this works because True can be coerced to 1, and False can be coerced to 0.

Question 7

@SuperBiasedMan thanks! I did indeed work that out through some trial and error in IDLE after having read his solution. Are there any other cases like that worth knowing? I've seen people throw around things like while None or while a where a is an empty list and things like that that I wouldn't normally associate with a boolean identity.

Question 8

@mburke05 That's called "truthiness" in Python. Regardless of type, variables can be evaluated as booleans. There's a write up of it here

Question 9

Don't use rng to mean range. I know you were avoiding the builtin function, but it's not clear. It also could be mistaken to mean Random Number Generator (though the usage makes it clear that's not the case). Even if you could use range, I would advise against it. range doesn't really describe what the list contains, just the type of list it is. Instead, name things more for their purpose and contents than meta information about the contents. Looking at the for loop, there's an obvious name to use, dates.

Also ending with just count by itself, not even being printed, is odd. It may come out right in the IDE you use but it often will just pass by unnoticed. Call print and ideally add text around it

print("There are {} first day of the month Sundays in the 20th century.".format(count))

str.format is the accepted way to format variables into strings. It will call the str() method on anything passed to it to coerce them to strings. It's preferable to using "string" + str(var) + "string" because it's clearer and can be shorter even if it might not seem like it from this example, but imagine this:

"My string has two variables, " + str(var) " and the other one is " + str(var)

Versus

"My string has two variables, {} and the other one is {}".format(var, var)

But also str.format has a lot of useful syntax to help format strings better. I wont go into them here as you don't need them for this script, but take a look over here.

You asked about join, but that has a fundamentally different purpose. join takes a collection parameter, like a list, and will make a string by concatenating the list together. It doesn't perform any string conversion and doesn't allow you to insert arbitrary text between values. I definitely wouldn't use it in this case.

Question 10

Thanks will do, I used rng simply because I recall it being in an example in the pandas documentation. Also, the count statement was simply due to my IDE as you noted (I work mostly in Jupyter, is this something I shouldn't be doing? I've seen debate on this and that between IDLE vs. PyCharm vs. iPython, etc.) Random question, could you explain the choice to use .format and not .join or even just "Your str" + str(var) where var is our number (count in this case.) Thanks!

Question 11

Whatever IDE you want to use is fine, but when I run this (just in IDLE) it gives a very different result, no output at all. Just placing a variable by itself isn't a trustworthy or reliable means to do anything. I'll edit my answer to explain a bit about format.

Question 12

Don't iterate the dates. Instead use the vectorized DatetimeIndex methods.
Don't use freq='D', which generates all dates in the century (~36,000). Instead use freq='MS', which filters down to the "Month Start" dates (~1,200). This means we search ~30x fewer dates and can also remove the check for day == 1.
Don't use weekday codes (0...6). Instead use day_name strings (Monday...Sunday), which are easier to interpret/maintain.

>>> day_ones = pd.date_range(start='1/1/1901', end='12/31/2000', freq='MS')
>>> day_ones.where(day_ones.day_name() == 'Sunday').notnull().sum()
# 171

In other words, we keep the start-of-month dates where the day_name is Sunday and then sum the notnull matches.

This approach is significantly faster:

timing plot showing it's fastest to combine freq='D' and Index.where()

Caridorc Caridorc 28.1k7 gold badges54 silver badges137 bronze badges · Accepted Answer · 2015-09-30 18:02:41Z

7

\$\begingroup\$

It is good to use a library rather than re-inventing everything yourself. Just be sure to avoid explicit looping in Python:

sum(date.day == 1 and date.weekday() == 6 for date in rng)

The above sums the number of times that date.day == 1 and date.weekday() == 6 automatically, with no loops of counters, It should also be more efficient (sum is implemented in C)

Share

answered Sep 30, 2015 at 18:02

Caridorc's user avatar

Caridorc Caridorc

28.1k7 gold badges54 silver badges137 bronze badges

\$\endgroup\$

4

1

\$\begingroup\$ This is really clever thanks! I hadn't thought to use the and clause like that in a loop. Also good to know regarding sum() \$\endgroup\$

mburke05
– mburke05

2015年09月30日 18:56:48 +00:00
Commented Sep 30, 2015 at 18:56
\$\begingroup\$ @mburke05 Note that this works because True can be coerced to 1, and False can be coerced to 0. \$\endgroup\$

SuperBiasedMan
– SuperBiasedMan

2015年10月01日 08:36:43 +00:00
Commented Oct 1, 2015 at 8:36
\$\begingroup\$ @SuperBiasedMan thanks! I did indeed work that out through some trial and error in IDLE after having read his solution. Are there any other cases like that worth knowing? I've seen people throw around things like while None or while a where a is an empty list and things like that that I wouldn't normally associate with a boolean identity. \$\endgroup\$

mburke05
– mburke05

2015年10月01日 13:31:28 +00:00
Commented Oct 1, 2015 at 13:31
1

\$\begingroup\$ @mburke05 That's called "truthiness" in Python. Regardless of type, variables can be evaluated as booleans. There's a write up of it here \$\endgroup\$

SuperBiasedMan
– SuperBiasedMan

2015年10月01日 13:32:57 +00:00
Commented Oct 1, 2015 at 13:32

Add a comment |

Stack Exchange Network

Project Euler #19: Counting Sundays in the 20th century using Pandas

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Project Euler #19: Counting Sundays in the 20th century using Pandas

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions