It's pretty typical for Linux command-line utilities that deal with files to accept either a file as input, or stdin
. It's also pretty common for them to be able to output to a file or to stdout
.
These should be supported:
python myprogram.py input output
cat input | python myprogram.py > output
python myprogram.py input > output
Additionally, I don't like editing files in-place. I'd rather make a tmpfile and then only copy over the tmpfile if the operation is a success, but I'd also rather not deal with tmpfiles every time I write something that has an output file. So both
python myprogram --in-place input
python myprogram filename filename
(ie, input is the same as output)
should use a tmpfile.
I wrote this pair of context managers to make this type of interface a little easier to write. They're meant to be used something like this:
with infile(infile_filename) as f:
for line in f: #do some stuff
with outfile(outfile_filename, infile_name=infile_filename, inplace=inplace) as f:
#get values for these arguments from argparse.
f.write('blah')
I would love to know what y'all think of it.
#!/usr/bin/python3
import os
import shutil
import sys
import tempfile
class infile(object):
def __init__(self, file_name=None):
self.file_name = file_name
def __enter__(self):
if self.file_name is None:
self.f = sys.stdin
else:
self.f = open(self.file_name)
return self.f
def __exit__(self, etype, value, traceback):
if self.f is not sys.stdin:
self.f.close()
def __getattr__(self, val):
return getattr(self.f, val) # pass on other attributes to the underlying filelike object
class outfile(object):
def __init__(self, file_name=None, *, infile_name=None, inplace=False):
self.file_name = file_name
self.infile_name = infile_name
self.inplace = inplace
def __enter__(self):
if self.inplace or (self.file_name and self.file_name == self.infile_name):
self.f = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
self.tmppath = self.f.name
elif self.file_name is None:
self.f = sys.stdout
self.tmppath = None
else:
self.f = open(self.file_name, 'w')
self.tmppath = None
return self.f
def __exit__(self, etype, value, traceback):
# If got no errors...
if etype is None and self.tmppath:
self.f.flush()
shutil.copy(self.tmppath, self.infile_name)
if self.f is not sys.stdout:
self.f.close()
if self.tmppath:
os.remove(self.tmppath)
def __getattr__(self, val):
return getattr(self.f, val)
1 Answer 1
All the files you're dealing with are already context managers, including sys.stdin
and sys.stdout
. Delegating to them instead of reinventing the wheel, your infile
class becomes a trivial function:
def infile(filename=None):
if filename is None:
return sys.stdin
return open(filename)
outfile
isn't quite as easy as that, but we can still simplify it a little. First, the inplace
flag completely changes the behaviour of the function (causes it to choose between mostly disjoint codepaths), and would usually be specified as a literal in the source. It makes more sense to have it as a separate function instead. If you do need to decide between them based on user input, you can always write a short wrapper function.
I'll reuse the name outfile
for the clobbering version, and use atomic_update
for the non-clobbering version. The clobbering version is then really just as simple as infile
:
def outfile(filename=None):
if filename is None:
return sys.stdout
return open(filename, 'w')
The updating version really does need to do some extra stuff after it the files get closed. So it does need to be it's own context manager. But instead of writing it as a class, it's a little easier to use the stdlib contextlib
to write it as a coroutine:
from contextlib import contextmanager
@contextmanager
def atomic_update(filename):
if filename is None:
f = sys.stdout
else:
f = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
tmppath = f.name
with f:
yield f
if filename is not None:
shutil.copy(tmppath, filename)
os.remove(tmppath)
nb. It would be nice if we just return sys.stdout
like in the other cases, but the contextmanger
decorator considers it an error if the generator doesn't yield exactly once.
As it is, this is a little bit cumbersome to use effectively - you need to open the file for input separately (the tempfile is returned empty), and then it's probably a good idea to be careful to close them in the right order. It also loses, eg, file permissions, which isn't ideal. You might want to copy the original file (and metadata) into the temp file, and then shutil.move
it back when you're done.
-
\$\begingroup\$ Well, that's a heck of a lot simpler than my code, much thanks :P \$\endgroup\$NightShadeQueen– NightShadeQueen2015年07月29日 20:52:06 +00:00Commented Jul 29, 2015 at 20:52
-
1\$\begingroup\$ The thing that I worried about, when writing a patch for
argparse.FileType
is how to keep the context from trying toclose
thestdin/out
. One way was to wrap them in a dummy context manager, with a 'do nothing' close. \$\endgroup\$hpaulj– hpaulj2015年07月30日 19:20:37 +00:00Commented Jul 30, 2015 at 19:20 -
\$\begingroup\$ Your code does a different thing. The OP's context managers do not close
sys.stdin
orsys.stdout
your context managers close them. \$\endgroup\$pabouk - Ukraine stay strong– pabouk - Ukraine stay strong2021年01月26日 15:43:25 +00:00Commented Jan 26, 2021 at 15:43 -
\$\begingroup\$ @hpaulj Meanwhile I learned more about context managers. I think for optional context manager it is best to use
contextlib.ExitStack
--- for a file you will call.enter_context()
to include the file into the stack of context managers forstdin
/stdout
you will not call it and it will not be handled by the context manager. \$\endgroup\$pabouk - Ukraine stay strong– pabouk - Ukraine stay strong2021年10月21日 12:54:55 +00:00Commented Oct 21, 2021 at 12:54