[Python-Dev] IO module improvements

Fri Feb 5 13:35:26 CET 2010

Hello
The new modular io system of python is awesome, but I'm running into 
some of its limits currently, while replacing the raw FileIO with a more 
advanced stream.
So here are a few ideas and questions regarding the mechanisms of this 
IO system. Note that I'm speaking in python terms, but these ideas 
should also apply to the C implementation (with more programming hassle 
of course).
- some streams have specific attributes (i.e mode, name...), but since 
they'll often been wrapped inside buffering or encoding streams, these 
attributes will not be available to the end user.
So wouldn't it be great to implement some "transversal inheritance", 
simply by delegating to the underlying buffer/raw-stream, attribute 
retrievals which fail on the current stream ? A little __getattr__ 
should do it fine, shouldn't it ?
By the way, I'm having trouble with the "name" attribute of raw files, 
which can be string or integer (confusing), ambiguous if containing a 
relative path, and which isn't able to handle the new case of my 
library, i.e opening a file from an existing file handle (which is ALSO 
an integer, like C file descriptors...) ; I propose we deprecate it for 
the benefit or more precise attributes, like "path" (absolute path) and 
"origin" (which can be "path", "fileno", "handle" and can be extended...).
Methods too would deserve some auto-forwarding. If you want to bufferize 
a raw stream which also offers size(), times(), lock_file() and other 
methods, how can these be accessed from a top-level buffering/text 
stream ? So it would be interesting to have a system through which a 
stream can expose its additional features to top level streams, and at 
the same time tell these if they must flush() or not before calling 
these new methods (eg. asking the inode number of a file doesn't require 
flushing, but knowing its real size DOES require it.).
- I feel thread-safety locking and stream stream status checking are 
currently overly complicated. All methods are filled with locking calls 
and CheckClosed() calls, which is both a performance loss (most io 
streams will have 3 such levels of locking, when 1 would suffice) and 
error-prone (some times ago I've seen in sources several functions in 
which checks and locks seemed lacking).
Since we're anyway in a mood of imbricating streams, why not simply 
adding a "safety stream" on top of each stream chain returned by open() 
? That layer could gracefully handle mutex locking, CheckClosed() calls, 
and even, maybe, the attribute/method forwarding I evocated above. I 
know a pure metaprogramming solution would maybe not suffice for 
performance-seekers, but static implementations should be doable as well.
- some semantic decisions of the current system are somehow dangerous. 
For example, flushing errors occuring on close are swallowed. It seems 
to me that it's of the utmost importance that the user be warned if the 
bytes he wrote disappeared before reaching the kernel ; shouldn't we 
decidedly enforce a "don't hide errors" everywhere in the io module ?.
Regards,
Pascal