Sort Big File Help

mk mrkafk at gmail.com
Wed Mar 3 16:52:35 EST 2010


MRAB wrote:
> [snip]
> Simpler would be:
>> lines = f.readlines()
> lines.sort(key=lambda line: line[ : 3])
>> or even:
>> lines = sorted(f.readlines(), key=lambda line: line[ : 3]))

Sure, but a complete newbie (I have this impression about OP) doesn't 
have to know about lambda.
I expected my solution to be slower, but it's not (on a file with 
100,000 random string lines):
# time ./sort1.py
real 0m0.386s
user 0m0.372s
sys 0m0.014s
# time ./sort2.py
real 0m0.303s
user 0m0.286s
sys 0m0.017s
sort1.py:
#!/usr/bin/python
def sortit(fname):
 lines = open(fname).readlines()
 lines.sort(key = lambda x: x[:3])
if __name__ == '__main__':
 sortit('testfile.txt')
sort2.py:
#!/usr/bin/python
def sortit(fname):
 fo = open(fname)
 linedict = {}
 for line in fo:
 key = line[:3]
 linedict[key] = line
 sortedlines = []
 keys = linedict.keys()
 keys.sort()
 for key in keys:
 sortedlines.append(linedict[key])
 return sortedlines
if __name__ == '__main__':
 sortit('testfile.txt')
Any idea why? After all, I'm "manually" doing quite a lot: allocating 
key in a dict, then sorting dict's keys, then iterating over keys and 
accessing dict value.
Regards,
mk


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /