1

I have a file such as:

a 1
a 2
b 5
c 8
a 9

I want to add together the second field per key so that I have an aggregate number and therefore a single key:value pair.

With a large dataset I am thinking the best way to go about this would be to create a dictionary that contains a list of values per unique key. Is this the best approach?

How do I set the lists of values per key accurately (below code seems to overwrite values instead of appending)?

dict={}
file=open('foo.txt','r')
lines=file.readlines()
for line in lines:
 k, v=line.split()
 dict[k]=[v]

now if i want to take the aggregate numbers populated in first dictionary and compare both the keys and the values against keys and values in another dictionary to determine differences between the two, i can only conclude something like the following:

for i in res.keys():

if res2.get(i):
 print 'match',i
else:
 print i,'does not match'

for i in res2.keys():

if res.get(i):
 print 'match',i
else:
 print i,'does not match'

for i in res.values():

if res2.get(i):
 print 'match',i
else:
 print i,'does not match'

for i in res2.values():

if res.get(i):
 print 'match',i
else:
 print i,'does not match'

cumbersome and buggy...need help!

asked Feb 12, 2012 at 16:56

3 Answers 3

7

Use a defaultdict to calculate the sums:

from collections import defaultdict
res = defaultdict(int)
with open('foo.txt', 'r') as f:
 for line in f:
 k,v = line.split()
 res[k] += int(v)
# res is now {"a": 12, "b": 5, "c": 8}

If you don't want the sums, but lists of elements, modify that to:

from collections import defaultdict
res = defaultdict(list)
with open('foo.txt', 'r') as f:
 for line in f:
 k,v = line.split()
 res[k].append(v)
# res is now ["a": ["1", "2", "9"], "b": ["5"], "c": ["8"]]

Note that I changed some variable names, notable file to f and dict to res. That's because file and dict are the names of built-ins and should therefore be avoided as variable names in order to avoid confusion.

Also, readlines is not necessary; you can directly iterate over the file.

Additionally, the with statement ensures that the file gets closed afterwards.

answered Feb 12, 2012 at 17:01
Sign up to request clarification or add additional context in comments.

2 Comments

will res[k].append(v) work if I have multiple variables that I want to append? ie. k,field1,field2,field3,field4=line.split() res[k].append(field1,field2,field3,field4)
No, append expects just one argument. While you could call append multiple times, you can simply write res[k] += [field1,field2,field3,field4].
4

If you just want a running total, you don't need to create a list to append elements. You can use a defaultdict and keep adding to it to get a running total.

from collections import defaultdict
key_totals = defaultdict(int)
with open('foo.txt', 'r') as f:
 for line in f:
 k, v = line.split()
 key_totals[k] += int(v)
Niklas B.
95.8k18 gold badges201 silver badges228 bronze badges
answered Feb 12, 2012 at 17:00

Comments

1

This is exactly what setdefault() is for:

d = {}
with open('foo.txt','r') as f:
 for line in f:
 k,v = line.split()
 d.setdefault(k, []).append(v)

Also, don't use dict as a variable name. And you can iterate directly over a file; no need to use .readlines() here.

answered Feb 12, 2012 at 16:59

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.