Partitioning comments by date

Question 1

I have a list of Comment objects comments. Each Comment has several properties, one of which is the date the comment was posted. I've sorted this list by date and there is at least 1 comment per date in a range.

I would like to loop over comments and for every date, create a new list of comments just for that day. However, the way I'm currently doing it feels very inefficient. Is there a more pythonic or efficient way to do this? (I've implemented a daterange(start_date, end_date) generator function according to this answer):

comment_arrays = [] #array of arrays
for d in date_range(date(2015,1,1), date(2016,1,1)):
 day_comments = []
 for c in comments:
 if c.date == d:
 day_comments.append(c)
 else if c.date > d:
 comment_arrays.append(day_comments)

Question 2

day_comments is appended to comment_arrays multiple times, is this intended?

Question 3

I think this would a great situation to use itertools.groupby in.

datekey = lambda x: x.date
comments.sort(key=datekey)
date_comments = [list(group) for date, group in itertools.groupby(comments, datekey)]

This of course groups all your comments, not only the ones in your date range. I'd use filter to skip the ones out of your range if you can't make use of the extra days comments.

Question 4

why is the sort step needed?

Question 5

groupby only works properly on sorted lists. If the list is already sorted (which was implied in the question), you don't need to sort again. I just included the sort since it's not optional (and it uses the same key function as groupby).

Question 6

The step and its side effect damage the solution's pythonicity if you ask me. sorted would fix the side effect but make another copy of comments in memory and still do the extra work.

Question 7

Like I said, I only included the sort step in my example code because it can use the same key function. If the list is already sorted, it doesn't need to be sorted again (with either sorted or list.sort).

Question 8

You are walking daterange one time and comments many times. Since the latter is, most likely, much longer than the former, it's gonna be more efficient to do the other way round. Besides, daterange is homogeneous so it can be accessed by index:

startdate=date(2015,1,1)
enddate=date(2016,1,1)
ndays=(enddate-startdate).days+1 #without +1 if the last date shall not be included
comments_by_day=[[] for _ in range(ndays)] 
for c in comments:
 i=c.date-startdate
 try: if i>=0: comments_by_day[i].append(c) #avoid wraparound behavior
 # for negative indices
 except IndexError: pass # date not in range

Blckknght Blckknght 3161 silver badge4 bronze badges · Accepted Answer · 2016-02-16 03:19:23Z

3

\$\begingroup\$

I think this would a great situation to use itertools.groupby in.

datekey = lambda x: x.date
comments.sort(key=datekey)
date_comments = [list(group) for date, group in itertools.groupby(comments, datekey)]

This of course groups all your comments, not only the ones in your date range. I'd use filter to skip the ones out of your range if you can't make use of the extra days comments.

Share

answered Feb 16, 2016 at 3:19

Blckknght's user avatar

Blckknght Blckknght

3161 silver badge4 bronze badges

\$\endgroup\$

4

\$\begingroup\$ why is the sort step needed? \$\endgroup\$

ivan_pozdeev
– ivan_pozdeev

2016年02月16日 03:20:52 +00:00
Commented Feb 16, 2016 at 3:20
1

\$\begingroup\$ groupby only works properly on sorted lists. If the list is already sorted (which was implied in the question), you don't need to sort again. I just included the sort since it's not optional (and it uses the same key function as groupby). \$\endgroup\$

Blckknght
– Blckknght

2016年02月16日 03:23:47 +00:00
Commented Feb 16, 2016 at 3:23
\$\begingroup\$ The step and its side effect damage the solution's pythonicity if you ask me. sorted would fix the side effect but make another copy of comments in memory and still do the extra work. \$\endgroup\$

ivan_pozdeev
– ivan_pozdeev

2016年02月16日 03:58:10 +00:00
Commented Feb 16, 2016 at 3:58
\$\begingroup\$ Like I said, I only included the sort step in my example code because it can use the same key function. If the list is already sorted, it doesn't need to be sorted again (with either sorted or list.sort). \$\endgroup\$

Blckknght
– Blckknght

2016年02月16日 04:00:44 +00:00
Commented Feb 16, 2016 at 4:00

Add a comment |

Stack Exchange Network

Partitioning comments by date

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Partitioning comments by date

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions