I've got a list of integers and I want to be able to identify contiguous blocks of duplicates: that is, I want to produce an order-preserving list of duples where each duples contains (int_in_question, number of occurrences).
For example, if I have a list like:
[0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
I want the result to be:
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]
I have a fairly simple way of doing this with a for-loop, a temp, and a counter:
result_list = []
current = source_list[0]
count = 0
for value in source_list:
if value == current:
count += 1
else:
result_list.append((current, count))
current = value
count = 1
result_list.append((current, count))
But I really like python's functional programming idioms, and I'd like to be able to do this with a simple generator expression. However I find it difficult to keep sub-counts when working with generators. I have a feeling a two-step process might get me there, but for now I'm stumped.
Is there a particularly elegant/pythonic way to do this, especially with generators?
-
11For reference this process is called: en.wikipedia.org/wiki/Run-length_encodingAaron Robson– Aaron Robson2013年04月14日 14:57:55 +00:00Commented Apr 14, 2013 at 14:57
1 Answer 1
>>> from itertools import groupby
>>> L = [0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
>>> grouped_L = [(k, sum(1 for i in g)) for k,g in groupby(L)]
>>> # Or (k, len(list(g))), but that creates an intermediate list
>>> grouped_L
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]
Batteries included, as they say.
Suggestion for using sum and generator expression from JBernardo; see comment.
6 Comments
len(list(g)) for sum(1 for i in g) to avoid intermediate storage.g has always kind of bothered me when I use groupby for this.def long_gen(): while True: yield 1 What is the len of this? See: stackoverflow.com/questions/390852/… sum in other places but hadn't thought to use it in this case. I think it would be quickly understood by most readers.