I have a list of Comment
objects comments
. Each Comment
has several properties, one of which is the date
the comment was posted. I've sorted this list by date and there is at least 1 comment per date in a range.
I would like to loop over comments
and for every date, create a new list of comments just for that day. However, the way I'm currently doing it feels very inefficient. Is there a more pythonic or efficient way to do this? (I've implemented a daterange(start_date, end_date)
generator function according to this answer):
comment_arrays = [] #array of arrays
for d in date_range(date(2015,1,1), date(2016,1,1)):
day_comments = []
for c in comments:
if c.date == d:
day_comments.append(c)
else if c.date > d:
comment_arrays.append(day_comments)
2 Answers 2
I think this would a great situation to use itertools.groupby
in.
datekey = lambda x: x.date
comments.sort(key=datekey)
date_comments = [list(group) for date, group in itertools.groupby(comments, datekey)]
This of course groups all your comments, not only the ones in your date range. I'd use filter
to skip the ones out of your range if you can't make use of the extra days comments.
-
\$\begingroup\$ why is the
sort
step needed? \$\endgroup\$ivan_pozdeev– ivan_pozdeev2016年02月16日 03:20:52 +00:00Commented Feb 16, 2016 at 3:20 -
1\$\begingroup\$
groupby
only works properly on sorted lists. If the list is already sorted (which was implied in the question), you don't need to sort again. I just included thesort
since it's not optional (and it uses the same key function asgroupby
). \$\endgroup\$Blckknght– Blckknght2016年02月16日 03:23:47 +00:00Commented Feb 16, 2016 at 3:23 -
\$\begingroup\$ The step and its side effect damage the solution's pythonicity if you ask me.
sorted
would fix the side effect but make another copy ofcomments
in memory and still do the extra work. \$\endgroup\$ivan_pozdeev– ivan_pozdeev2016年02月16日 03:58:10 +00:00Commented Feb 16, 2016 at 3:58 -
\$\begingroup\$ Like I said, I only included the
sort
step in my example code because it can use the same key function. If the list is already sorted, it doesn't need to be sorted again (with eithersorted
orlist.sort
). \$\endgroup\$Blckknght– Blckknght2016年02月16日 04:00:44 +00:00Commented Feb 16, 2016 at 4:00
You are walking daterange
one time and comments
many times. Since the latter is, most likely, much longer than the former, it's gonna be more efficient to do the other way round. Besides, daterange
is homogeneous so it can be accessed by index:
startdate=date(2015,1,1)
enddate=date(2016,1,1)
ndays=(enddate-startdate).days+1 #without +1 if the last date shall not be included
comments_by_day=[[] for _ in range(ndays)]
for c in comments:
i=c.date-startdate
try: if i>=0: comments_by_day[i].append(c) #avoid wraparound behavior
# for negative indices
except IndexError: pass # date not in range
day_comments
is appended tocomment_arrays
multiple times, is this intended? \$\endgroup\$