better way to do this in python
nn
pruebauno at latinmail.com
Mon Apr 4 12:10:56 EDT 2011
On Apr 3, 8:06 am, Mag Gam <magaw... at gmail.com> wrote:
> Thanks for the responses.
>> Basically, I have a large file with this format,
>> Date INFO username command srcipaddress filename
>> I would like to do statistics on:
> total number of usernames and who they are
> username and commands
> username and filenames
> unique source ip addresses
> unique filenames
>> Then I would like to bucket findings with days (date).
>> Overall, I would like to build a log file analyzer.
>>>>>>>> On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsali... at gmail.com> wrote:
>> > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <ros... at gmail.com> wrote:
>> >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magaw... at gmail.com> wrote:
> >> > I suppose I can do something like this.
> >> > (pseudocode)
>> >> > d={}
> >> > try:
> >> > d[key]+=1
> >> > except KeyError:
> >> > d[key]=1
>> >> > I was wondering if there is a pythonic way of doing this? I plan on
> >> > doing this many times for various files. Would the python collections
> >> > class be sufficient?
>> >> I think you want collections.Counter. From the docs: "Counter objects
> >> have a dictionary interface except that they return a zero count for
> >> missing items instead of raising a KeyError".
>> >> ChrisA
>> > I realize you (Mag) asked for a Python solution, but since you mention
> > awk... you can also do this with "sort < input | uniq -c" - one line of
> > "code". GNU sort doesn't use as nice an algorithm as a hashing-based
> > solution (like you'd probably use with Python), but for a sort, GNU sort's
> > quite good.
>> > --
> >http://mail.python.org/mailman/listinfo/python-list
Take a look at:
http://code.activestate.com/recipes/577535-aggregates-using-groupby-defaultdict-and-counter/
for some ideas of how to group and count things.
More information about the Python-list
mailing list