Python-list Digest, Vol 88, Issue 69

Wed Jan 26 16:25:04 EST 2011

Sent from my LG phone
python-list-request at python.org wrote:
>Send Python-list mailing list submissions to
>	python-list at python.org
>>To subscribe or unsubscribe via the World Wide Web, visit
>	http://mail.python.org/mailman/listinfo/python-list
>or, via email, send a message with subject or body 'help' to
>	python-list-request at python.org
>>You can reach the person managing the list at
>	python-list-owner at python.org
>>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Python-list digest..."
>>Today's Topics:
>> 1. Re: Python use growing fast (Alice Bevan?McGregor)
> 2. Re: order of importing modules (Chris Rebert)
> 3. Re: How to Buffer Serialized Objects to Disk (MRAB)
> 4. Re: How to Buffer Serialized Objects to Disk (Chris Rebert)
> 5. Re: How to Buffer Serialized Objects to Disk (Peter Otten)
> 6. Re: Best way to automatically copy out attachments from an
> email (Chris Rebert)
> 7. Re: Parsing string for "<verb> <noun>" (Aahz)
> 8. Re: Nested structures question (Tim Harig)
> 9. Re: How to Buffer Serialized Objects to Disk (Scott McCarty)
>>On 2011年01月10日 19:49:47 -0800, Roy Smith said:
>>> One of the surprising (to me, anyway) uses of JavaScript is as the 
>> scripting language for MongoDB (http://www.mongodb.org/).
>>I just wish they'd drop spidermonkey and go with V8 or another, faster 
>and more modern engine. :(
>>	- Alice.
>>>>>> Dan Stromberg wrote:
>>> On Tue, Jan 11, 2011 at 4:30 PM, Catherine Moroney
>>> <Catherine.M.Moroney at jpl.nasa.gov> wrote:
>>>>>>>> In what order does python import modules on a Linux system?  I have a
>>>> package that is both installed in /usr/lib64/python2.5/site-packages,
>>>> and a newer version of the same module in a working directory.
>>>>>>>> I want to import the version from the working directory, but when I
>>>> print module.__file__ in the interpreter after importing the module,
>>>> I get the version that's in site-packages.
>>>>>>>> I've played with the PYTHONPATH environmental variable by setting it
>>>> to just the path of the working directory, but when I import the module
>>>> I still pick up the version in site-packages.
>>>>>>>> /usr/lib64 is in my PATH variable, but doesn't appear anywhere else.  I
>>>> don't want to remove /usr/lib64 from my PATH because that will break
>>>> a lot of stuff.
>>>>>>>> Can I force python to import from my PYTHONPATH first, before looking
>>>> in the system directory?
>>>>>>> Please import sys and inspect sys.path; this defines the search path
>>> for imports.
>>>>>> By looking at sys.path, you can see where in the search order your
>>> $PYTHONPATH is going.
>>>>On Wed, Jan 12, 2011 at 11:07 AM, Catherine Moroney
><Catherine.M.Moroney at jpl.nasa.gov> wrote:
>> I've looked at my sys.path variable and I see that it has
>> a whole bunch of site-package directories, followed by the
>> contents of my $PYTHONPATH variable, followed by a list of
>> misc site-package variables (see below).
><snip>
>> But, I'm curious as to where the first bunch of 'site-package'
>> entries come from.  The
>> /usr/lib64/python2.5/site-packages/pyhdfeos-1.0_r57_58-py2.5-linux-x86_64.egg
>> is not present in any of my environmental variables yet it shows up
>> as one of the first entries in sys.path.
>>You probably have a .pth file somewhere that adds it (since it's an
>egg, probably site-packages/easy-install.pth).
>See http://docs.python.org/install/index.html#modifying-python-s-search-path
>>Cheers,
>Chris
>--
>http://blog.rebertia.com
>>>On 12/01/2011 21:05, Scott McCarty wrote:
>> Sorry to ask this question. I have search the list archives and googled,
>> but I don't even know what words to find what I am looking for, I am
>> just looking for a little kick in the right direction.
>>>> I have a Python based log analysis program called petit
>> (http://crunchtools.com/petit). I am trying to modify it to manage the
>> main object types to and from disk.
>>>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use
>> for analysis techniques. At the very beginning I build up the list of
>> objects then would like to start pickling it while building to save
>> memory. I want to be able to process more entries than I have memory.
>> With a strait list it looks like I could build from xreadlines(), but
>> once you turn it into a more complex object, I don't quick know where to go.
>>>> I understand how to pickle the entire data structure, but I need
>> something that will manage the memory/disk allocation? Any thoughts?
>>>To me it sounds like you need to use a database.
>>>On Wed, Jan 12, 2011 at 1:05 PM, Scott McCarty <scott.mccarty at gmail.com> wrote:
>> Sorry to ask this question. I have search the list archives and googled, but
>> I don't even know what words to find what I am looking for, I am just
>> looking for a little kick in the right direction.
>> I have a Python based log analysis program called petit
>> (http://crunchtools.com/petit). I am trying to modify it to manage the main
>> object types to and from disk.
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use for
>> analysis techniques. At the very beginning I build up the list of objects
>> then would like to start pickling it while building to save memory. I want
>> to be able to process more entries than I have memory. With a strait list it
>> looks like I could build from xreadlines(), but once you turn it into a more
>> complex object, I don't quick know where to go.
>> I understand how to pickle the entire data structure, but I need something
>> that will manage the memory/disk allocation?  Any thoughts?
>>You could subclass `list` and use sys.getsizeof()
>[http://docs.python.org/library/sys.html#sys.getsizeof ] to keep track
>of the size of the elements, and then start pickling them to disk once
>the total size reaches some preset limit.
>But like MRAB said, using a proper database, e.g. SQLite
>(http://docs.python.org/library/sqlite3.html ), wouldn't be a bad idea
>either.
>>Cheers,
>Chris
>--
>http://blog.rebertia.com
>>>Scott McCarty wrote:
>>> Sorry to ask this question. I have search the list archives and googled,
>> but I don't even know what words to find what I am looking for, I am just
>> looking for a little kick in the right direction.
>>>> I have a Python based log analysis program called petit (
>> http://crunchtools.com/petit). I am trying to modify it to manage the main
>> object types to and from disk.
>>>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use
>> for analysis techniques. At the very beginning I build up the list of
>> objects then would like to start pickling it while building to save
>> memory. I want to be able to process more entries than I have memory. With
>> a strait list it looks like I could build from xreadlines(), but once you
>> turn it into a more complex object, I don't quick know where to go.
>>>> I understand how to pickle the entire data structure, but I need something
>> that will manage the memory/disk allocation? Any thoughts?
>>You can write multiple pickled objects into a single file:
>>import cPickle as pickle
>>def dump(filename, items):
> with open(filename, "wb") as out:
> dump = pickle.Pickler(out).dump
> for item in items:
> dump(item)
>>def load(filename):
> with open(filename, "rb") as instream:
> load = pickle.Unpickler(instream).load
> while True:
> try:
> item = load()
> except EOFError:
> break
> yield item
>>if __name__ == "__main__":
> filename = "tmp.pickle"
> from collections import namedtuple
> T = namedtuple("T", "alpha beta")
> dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
> for item in load(filename):
> print item
>>To get random access you'd have to maintain a list containing the offsets of 
>the entries in the file.
>However, a simple database like SQLite is probably sufficient for the kind 
>of entries you have in mind, and it allows operations like aggregation, 
>sorting and grouping out of the box.
>>Peter
>>>>On Wed, Jan 12, 2011 at 10:59 AM, Matty Sarro <msarro at gmail.com> wrote:
>> As of now here is my situation:
>> I am working on a system to aggregate IT data and logs. A number of
>> important data are gathered by a third party system. The only
>> immediate way I have to access the data is to have their system
>> automatically email me updates in CSV format every hour. If I set up a
>> mail client on the server, this shouldn't be a huge issue.
>>>> However, is there a way to automatically open the emails, and copy the
>> attachments to a directory based on the filename? Kind of a weird
>> project, I know. Just looking for some ideas hence posting this on two
>> lists.
>>Parsing out email attachments:
>http://docs.python.org/library/email.parser.html
>http://docs.python.org/library/email.message.html#module-email.message
>>Parsing the extension from a filename:
>http://docs.python.org/library/os.path.html#os.path.splitext
>>Retrieving email from a mail server:
>http://docs.python.org/library/poplib.html
>http://docs.python.org/library/imaplib.html
>>You could poll for new messages via a cron job or the `sched` module
>(http://docs.python.org/library/sched.html ). Or if the messages are
>being delivered locally, you could use inotify bindings or similar to
>watch the appropriate directory for incoming mail. Integration with a
>mail server itself is also a possibility, but I don't know much about
>that.
>>Cheers,
>Chris
>--
>http://blog.rebertia.com
>>>In article <0d7143ca-45cf-44c3-9e8d-acb867c52037 at f30g2000yqa.googlegroups.com>,
>Daniel da Silva <ddasilva at umd.edu> wrote:
>>>>I have come across a task where I would like to scan a short 20-80
>>character line of text for instances of "<verb> <noun>". Ideally
>><verb> could be of any tense.
>>In Soviet Russia, <noun> <verbs> you!
>-- 
>Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/
>>"Think of it as evolution in action." --Tony Rand
>>>In case you still need help:
>>- # Set the initial values
>- the_number= random.randrange(100) + 1
>- tries = 0
>- guess = None
>- 
>- # Guessing loop
>- while guess != the_number and tries < 7:
>- guess = int(raw_input("Take a guess: "))
>- if guess > the_number:
>- print "Lower..."
>- elif guess < the_number:
>- print "Higher..."
>- tries += 1
>- 
>- # did the user guess correctly to make too many guesses?
>- if guess == the_number:
>- print "You guessed it! The number was", the_number
>- print "And it only took you", tries, "tries!\n"
>- else:
>- print "Wow you suck! It should only take at most 7 tries!"
>- 
>- raw_input("Press Enter to exit the program.")
>>>Been digging ever since I posted this. I suspected that the response might
>be use a database. I am worried I am trying to reinvent the wheel. The
>problem is I don't want any dependencies and I also don't need persistence
>program runs. I kind of wanted to keep the use of petit very similar to cat,
>head, awk, etc. But, that said, I have realized that if I provide the
>analysis features as an API, you very well, might want persistence between
>runs.
>>What about using an array inside a shelve?
>>Just got done messing with this in python shell:
>>import shelve
>>d = shelve.open(filename="/root/test.shelf", protocol=-1)
>>d["log"] = ()
>d["log"].append("test1")
>d["log"].append("test2")
>d["log"].append("test3")
>>Then, always interacting with d["log"], for example:
>>for i in d["log"]:
> print i
>>Thoughts?
>>>I know this won't manage memory, but it will keep the footprint down right?
>On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__peter__ at web.de> wrote:
>>> Scott McCarty wrote:
>>>> > Sorry to ask this question. I have search the list archives and googled,
>> > but I don't even know what words to find what I am looking for, I am just
>> > looking for a little kick in the right direction.
>> >
>> > I have a Python based log analysis program called petit (
>> > http://crunchtools.com/petit). I am trying to modify it to manage the
>> main
>> > object types to and from disk.
>> >
>> > Essentially, I have one object which is a list of a bunch of "Entry"
>> > objects. The Entry objects have date, time, date, etc fields which I use
>> > for analysis techniques. At the very beginning I build up the list of
>> > objects then would like to start pickling it while building to save
>> > memory. I want to be able to process more entries than I have memory.
>> With
>> > a strait list it looks like I could build from xreadlines(), but once you
>> > turn it into a more complex object, I don't quick know where to go.
>> >
>> > I understand how to pickle the entire data structure, but I need
>> something
>> > that will manage the memory/disk allocation? Any thoughts?
>>>> You can write multiple pickled objects into a single file:
>>>> import cPickle as pickle
>>>> def dump(filename, items):
>> with open(filename, "wb") as out:
>> dump = pickle.Pickler(out).dump
>> for item in items:
>> dump(item)
>>>> def load(filename):
>> with open(filename, "rb") as instream:
>> load = pickle.Unpickler(instream).load
>> while True:
>> try:
>> item = load()
>> except EOFError:
>> break
>> yield item
>>>> if __name__ == "__main__":
>> filename = "tmp.pickle"
>> from collections import namedtuple
>> T = namedtuple("T", "alpha beta")
>> dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
>> for item in load(filename):
>> print item
>>>> To get random access you'd have to maintain a list containing the offsets
>> of
>> the entries in the file.
>> However, a simple database like SQLite is probably sufficient for the kind
>> of entries you have in mind, and it allows operations like aggregation,
>> sorting and grouping out of the box.
>>>> Peter
>>>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>>>-- 
>http://mail.python.org/mailman/listinfo/python-list