Dictionary with List of Values

Question 1

I have a file such as:

a 1
a 2
b 5
c 8
a 9

I want to add together the second field per key so that I have an aggregate number and therefore a single key:value pair.

With a large dataset I am thinking the best way to go about this would be to create a dictionary that contains a list of values per unique key. Is this the best approach?

How do I set the lists of values per key accurately (below code seems to overwrite values instead of appending)?

dict={}
file=open('foo.txt','r')
lines=file.readlines()
for line in lines:
 k, v=line.split()
 dict[k]=[v]

now if i want to take the aggregate numbers populated in first dictionary and compare both the keys and the values against keys and values in another dictionary to determine differences between the two, i can only conclude something like the following:

for i in res.keys():

if res2.get(i):
 print 'match',i
else:
 print i,'does not match'

for i in res2.keys():

if res.get(i):
 print 'match',i
else:
 print i,'does not match'

for i in res.values():

if res2.get(i):
 print 'match',i
else:
 print i,'does not match'

for i in res2.values():

if res.get(i):
 print 'match',i
else:
 print i,'does not match'

cumbersome and buggy...need help!

Question 2

Use a defaultdict to calculate the sums:

from collections import defaultdict
res = defaultdict(int)
with open('foo.txt', 'r') as f:
 for line in f:
 k,v = line.split()
 res[k] += int(v)
# res is now {"a": 12, "b": 5, "c": 8}

If you don't want the sums, but lists of elements, modify that to:

from collections import defaultdict
res = defaultdict(list)
with open('foo.txt', 'r') as f:
 for line in f:
 k,v = line.split()
 res[k].append(v)
# res is now ["a": ["1", "2", "9"], "b": ["5"], "c": ["8"]]

Note that I changed some variable names, notable file to f and dict to res. That's because file and dict are the names of built-ins and should therefore be avoided as variable names in order to avoid confusion.

Also, readlines is not necessary; you can directly iterate over the file.

Additionally, the with statement ensures that the file gets closed afterwards.

Question 3

will res[k].append(v) work if I have multiple variables that I want to append? ie. k,field1,field2,field3,field4=line.split() res[k].append(field1,field2,field3,field4)

Question 4

No, append expects just one argument. While you could call append multiple times, you can simply write res[k] += [field1,field2,field3,field4].

Question 5

If you just want a running total, you don't need to create a list to append elements. You can use a defaultdict and keep adding to it to get a running total.

from collections import defaultdict
key_totals = defaultdict(int)
with open('foo.txt', 'r') as f:
 for line in f:
 k, v = line.split()
 key_totals[k] += int(v)

Question 6

This is exactly what setdefault() is for:

d = {}
with open('foo.txt','r') as f:
 for line in f:
 k,v = line.split()
 d.setdefault(k, []).append(v)

Also, don't use dict as a variable name. And you can iterate directly over a file; no need to use .readlines() here.

phihag 289k75 gold badges475 silver badges489 bronze badges · Accepted Answer · 2012-02-12 17:01:31Z

Use a defaultdict to calculate the sums:

from collections import defaultdict
res = defaultdict(int)
with open('foo.txt', 'r') as f:
 for line in f:
 k,v = line.split()
 res[k] += int(v)
# res is now {"a": 12, "b": 5, "c": 8}

If you don't want the sums, but lists of elements, modify that to:

from collections import defaultdict
res = defaultdict(list)
with open('foo.txt', 'r') as f:
 for line in f:
 k,v = line.split()
 res[k].append(v)
# res is now ["a": ["1", "2", "9"], "b": ["5"], "c": ["8"]]

Note that I changed some variable names, notable file to f and dict to res. That's because file and dict are the names of built-ins and should therefore be avoided as variable names in order to avoid confusion.

Also, readlines is not necessary; you can directly iterate over the file.

Additionally, the with statement ensures that the file gets closed afterwards.

will res[k].append(v) work if I have multiple variables that I want to append? ie. k,field1,field2,field3,field4=line.split() res[k].append(field1,field2,field3,field4)
No, append expects just one argument. While you could call append multiple times, you can simply write res[k] += [field1,field2,field3,field4].

CollectivesTM on Stack Overflow

Dictionary with List of Values

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related