memory usage multi value hash

Algis Kabaila akabaila at pcug.org.au
Fri Apr 15 04:01:45 EDT 2011


On Friday 15 April 2011 02:13:51 christian wrote:
> Hello,
>> i'm not very experienced in python. Is there a way doing
> below more memory efficient and maybe faster.
> I import a 2-column file and then concat for every unique
> value in the first column ( key) the value from the second
> columns.
>> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
>>> Thanks for advance & regards,
> Christian
>>> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
>> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
> for k, it in groupby(z, itemgetter(0)))
> del(z)
>> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
> f.write(v + "\n")
>> f.close()
Two alternative solutions - the second one with generators is 
probably the most economical as far as RAM usage is concerned.
For you example data1.txt is taken as follows:
A, 1
B, 3
C, 9
A, 2
B, 4
C, 10
A, 3
C, 11
C, 12
C, 90
C, 34
C, 322
C, 21
The "two in one" program is:
#!/usr/bin python
'''generate.py - Example of reading long two column csv list and
sorting. Thread "memory usage multi value hash"
'''
# Determine a set of unique column 1 values
unique_set = set()
with open('data1.txt') as f:
 for line in f:
 unique_set.add(line.split(',')[0])
 print(unique_set)
with open('data1.txt') as f:
 for x in unique_set:
 ls = [line.split(',')[1].rstrip() for line in f if 
line.split(',')[0].rstrip() == x]
 print(x.rstrip(), ','.join(ls))
 f.seek(0)
print ('\n Alternative solution with generators')
with open('data1.txt') as f:
 for x in unique_set:
 gs = (line.split(',')[1].rstrip() for line in f if 
line.split(',')[0].rstrip() == x)
 s = ''
 for ds in gs:
 s = s + ds
 print(x.rstrip(), s)
 f.seek(0)
The output is:
{'A', 'C', 'B'}
A 1, 2, 3
C 9, 10, 11, 12, 90, 34, 322, 21
B 3, 4
 Alternative solution with generators
A 1 2 3
C 9 10 11 12 90 34 322 21
B 3 4
Notice that data sequence could be different, without any effect 
on output.
OldAl.
-- 
Algis
http://akabaila.pcug.org.au/StructuralAnalysis.pdf


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /