I have a list with sublists in it. EG: ([1, 2], [1, 56], [2, 787], [2, 98], [3, 90]) which is created by appending values to it while running a for loop.
I am working in python, and i want to add the 2nd element of each sublist where the 1st elements are same. in my eg: i want to add 2+56 (both have 1st index as 1), 787+98(both have 1st index as 2) and keep 90 as it is because there is just one element with 1st index as 3.
I'm not sure how to do this.
Here is my code:
import urllib, re
from itertools import groupby
import collections
import itertools, operator
text = urllib.urlopen("some html page").read()
data = re.compile(r'.*?<BODY>(.*?)<HR>', re.DOTALL).match(text).group(1)// storing contents from the BODY tag
values = [line.split() for line in data.splitlines()] //List with the BODY data
/* values contain elements like [[65, 67], [112, 123, 12], [387, 198, 09]]
it contains elements with length 2 and three.
i am just concerned with elements with length 3
in the for loop, i am doing this, and passing it to 2 functions.*/
def function1 (docid, doclen, tf):
new=[];
avgdoclen = 288;
tf = float(x[2]);
doclen = float(x[1]);
answer1 = tf / (tf + 0.5 + (1.5*doclen/avgdoclen));
q = function2(docid, doclen, tf)
production = answer1 * q //this is the production of
new.append(docid) // i want to add all the production values where docid are same.
new.append(production)
return answer1
def function2 (docid, doclen, tf):
avgdoclen = 288;
querylen = 12;
tf= float(x[2]);
answer2 = tf/(tf + 0.5 + (1.5*querylen/avgdoclen));
return answer2
for x in values:
if len(x)==3:
okapi_doc(x[0], x[1], x[2])
okapi_query(x[0], x[1], x[2])
I want to add all the production values where the docid are same. Now when i print new, i get the following output:
['112', 0.3559469323909391]
['150', 0.31715060007742935]
['158', 0.122025819265144]
['176', 0.3862207694241891]
['188', 0.5057900225015092]
['236', 0.12628982528263102]
['251', 0.12166336633663369]
this is not a list. when i print new[0][0] i get 1. I want to get 112 when i print new[0][0]. Is there something wrong with append? ['334', 0.5851519557155408]
4 Answers 4
This is pretty straightforward. dict.get(key, default) returns the value if the key exists, or a default.
totals = {}
for k,v in data:
totals[k] = totals.get(k, 0) + v
5 Comments
from collections import defaultdict;totals = defaultdict(int); for k, v in data: totals[k] += vCounter-based solution. The defaultdict-based solution is correct.Counter was just a python 2.7 specialization of collections.defaultdict(int).Counter just counts instances of a particular key in a flat sequence. It doesn't do any summation or anything like that. In other words, for Counter to be helpful here, you'd have to pass it a list like this: [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3....].This might be a job for itertools:
>>> import itertools, operator
>>> l = sorted([[1, 2], [1, 56], [2, 787], [2, 98], [3, 90]])
>>> keys_groups = itertools.groupby(l, key=operator.itemgetter(0))
>>> sums = [[key, sum(i[1] for i in group)] for key, group in keys_groups]
>>> sums
[[1, 58], [2, 885], [3, 90]]
Note that for groupby to work as expected, the items have to be sorted by the key given. In this case, since the key is the first item in the pair, I didn't have to do this, but for a more general solution, you should use a key parameter to sort the list.
>>> l2 = [[787, 2], [98, 2], [90, 3], [2, 1], [56, 1]]
>>> l2.sort(key=operator.itemgetter(1))
>>> l2
[[2, 1], [56, 1], [787, 2], [98, 2], [90, 3]]
>>> keys_groups = itertools.groupby(l2, key=operator.itemgetter(1))
>>> sums = [[key, sum(i[0] for i in group)] for key, group in keys_groups]
>>> sums
[[1, 58], [2, 885], [3, 90]]
Works fine with the data you posted. I edited it a bit to make the example more realistic.
>>> l = [['112', 0.3559469323909391], ['150', 0.31715060007742935],
['158',0.122025819265144], ['176', 0.3862207694241891],
['188', 0.5057900225015092], ['377', 0.12628982528263102],
['251', 0.12166336633663369], ['334', 0.5851519557155408],
['334', 0.14663484486873507], ['112', 0.2345038167938931],
['377', 0.10694516971279373], ['112', 0.28981132075471694]]
>>> l.sort(key=operator.itemgetter(0))
>>> keys_groups = itertools.groupby(l, key=operator.itemgetter(0))
>>> sums = [[key, sum(i[1] for i in group)] for key, group in keys_groups]
>>> sums
[['112', 0.88026206993954914], ['150', 0.31715060007742935],
['158', 0.122025819265144], ['176', 0.38622076942418909],
['188', 0.50579002250150917], ['251', 0.12166336633663369],
['334', 0.73178680058427581], ['377', 0.23323499499542477]]
Note that as WolframH points out, sorting will generally increase the time complexity; but Python's sort algorithm is smart enough to make use of runs in data, so it might not -- it all depends on the data. Still, if your data is highly anti-sorted, Winston Ewert's defaultdict-based solution may be better. (But ignore that first Counter snippet -- I have no idea what's going on there.)
A couple of notes on how to create a list -- there are lots of ways, but the two basic ways in Python are as follows -- first a list comprehension:
>>> def simple_function(x):
... return [x, x ** 2]
...
>>> in_data = range(10)
>>> out_data = [simple_function(x) for x in in_data]
>>> out_data
[[0, 0], [1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36], [7, 49], [8, 64], [9, 81]]
And second, a for loop:
>>> out_data = []
>>> for x in in_data:
... out_data.append(simple_function(x))
...
>>> out_data
[[0, 0], [1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36], [7, 49], [8, 64], [9, 81]]
8 Comments
import collections
result = collections.defaultdict(int) # works like a dictionary
# but all keys have a default value of zero
for key, value in mylist:
result[key] += value
print result
12 Comments
dict keeps only the last value for each key.The fact that you:
want to add the 2nd element of each sublist where the 1st elements are same
makes me think that you want to be using a dict rather than a list - a dict is optimised for retrieving the 2nd value based on the 1st
Some code along the lines of:
oldvalue = mydict.get(firstvalue, 0)
newvalue = oldvalue + secondvalue
mydict[firstvalue] = newvalue
would let you build up the dict as you go - or if that's not feasible, it will let you calculate the sums in only a single pass over the list.
Quick spin in the interpreter just to test this out:
>>> l = [[1, 2], [1, 56], [2, 787], [2, 98], [3, 90]]
>>> mydict = {}
>>> for firstvalue, secondvalue in l:
... oldvalue = mydict.get(firstvalue, 0)
... newvalue = oldvalue + secondvalue
... mydict[firstvalue] = newvalue
...
>>> print mydict
{1: 58, 2: 885, 3: 90}
Looks fairly close to what you want.
function1, you createproductionandnewand then throw both away.newafter the function exits? You have to return it and put it in a list to get a list ofnews.