Skip to main content
Code Review

Timeline for Reading set of large log files line by line and count how many hostnames appear on each line

Current License: CC BY-SA 4.0

12 events
when toggle format what by license comment
Aug 13, 2018 at 20:34 history edited AJNeufeld CC BY-SA 4.0
added 16 characters in body
Aug 13, 2018 at 20:29 comment added AJNeufeld Added attempted speedup using multiprocessing.Pool. If you have more than one CPU, and are not I/O bound, you may gain some speed. I'd be interested in hearing your result.
Aug 13, 2018 at 20:24 history edited AJNeufeld CC BY-SA 4.0
Added multiprocessing for possible speed increase
Aug 13, 2018 at 15:16 comment added AJNeufeld Python is an interpreted, loosely typed language; you could speed up your program by rewriting it in C. If you want to leave it in Python, you may get additional speed from bigger picture optimizations. Are all 300 log files different each time you run, or are some "history" files that you can cache the hostname counts from? Processing each file in a separate process (not Python thread!) may help if you are not I/O bound. (See the multiprocessing package)
Aug 13, 2018 at 14:53 comment added Mamatha Yes counter became my life saver ! program came down to 45 mins, so would that be the max threshold of speed and no way we can make it more faster ?
Aug 10, 2018 at 1:28 history edited AJNeufeld CC BY-SA 4.0
Omitted ()'s in construction of `Counter()`
Aug 10, 2018 at 1:18 history edited AJNeufeld CC BY-SA 4.0
Added link to Counter documentation
Aug 10, 2018 at 1:17 comment added Ben.T I read few times about the Counter method but never think about using it, this is awesome :)
Aug 10, 2018 at 1:11 history edited AJNeufeld CC BY-SA 4.0
Hostname ends in a slash, not a backslash
Aug 10, 2018 at 1:05 history edited AJNeufeld CC BY-SA 4.0
Better variable names.
Aug 10, 2018 at 0:56 history edited AJNeufeld CC BY-SA 4.0
added 80 characters in body
Aug 10, 2018 at 0:51 history answered AJNeufeld CC BY-SA 4.0

AltStyle によって変換されたページ (->オリジナル) /