I have some C# I/O code that can take one or two input files. I would like to wrap the Stream
objects in a using
block, but cannot find a way to succinctly express this in code.
Currently, I have two large files (>1GB), that contain concatenated TIFF files and PDF files respectively that need to be extracted into individual files. The metadata of the individual files (TIFF Tags and PDF Keywords) cross-reference one another, so the processing rules are different if I receive both files at once, i.e. there is more information and verification logic.
I have a FileProcessor
class than implements an IEnumerable<Range>
to return the byte ranges of each individual file within the archive. My current implementation just wraps the main processing loop in a try/finally
block and calls Dispose
manually.
// Struct to represent the byte range of a file in the archive
struct Range
{
public long start;
public long end;
}
// Custom class that takes a Stream object in the constructor and implements IEnumerable<Range> to return the individual files in sequence
public class FileProcessor : IDisposable, IEnumerable, IEnumerable<Range>
{
private Stream _stream;
public FileProcessor(Stream stream) { this._stream = stream; }
public virtual void Dispose()
{
if (_stream != null) { _stream.Dispose(); _stream = null; }
}
}
public class TiffProcessor : FileProcessor { ... }
public class PdfProcessor : FileProcessor { ... }
// Snippet of the dispatching/processing logic
TiffProcessor infile1 = null,
PdfProcessor infile2 = null;
try
{
if (HasFirstInputFile)
{
infile1 = new TiffProcessor(File.OpenRead(FirstInputFileName));
}
if (HasSecondInputFile)
{
infile2 = new PdfProcessor(File.OpenRead(SecondInputFileName));
}
if (infile1 != null && infile2 == null)
{
foreach (var range in infile1) { ... }
}
if (infile1 == null && infile2 != null)
{
foreach (var range in infile2) { ... }
}
if (infile1 != null && infile2 != null)
{
foreach (var ranges in infile1.Zip(infile2, (tiff, pdf) => new Range[] { tiff, pdf })) { ... }
}
}
finally
{
if (infile1 != null) { infile1.Dispose(); }
if (infile2 != null) { infile2.Dispose(); }
}
Is there any construct that can help clean up and organize this code structure. It's not too bad now, but could become exponentially more complex if additional input streams are required in the future.
Edit
Based on the comments received, perhaps creating an intermediate Strategy
object that managed resources would work?
using (var strategy = StrategyFactory.Create(CommandLineArgs))
{
strategy.Process();
}
public static class StrategyFactory
{
public static IStrategy Create(CommandLineArguments args)
{
if (args.FirstInputFile != null && args.SecondInputFile == null)
{
return new FirstStrategy(args.FirstInputFile);
}
if (args.FirstInputFile == null && args.SecondInputFile != null)
{
return new SecondStrategy(args.SecondInputFile);
}
if (args.FirstInputFile != null && args.SecondInputFile != null)
{
return new ThirdStrategy(args.FirstInputFile, args.SecondInputFile);
}
}
}
1 Answer 1
The only think that I would do different is that I wouldn't dispose streams inside your FileProcessor
classes. Disposing of resources should be done at the same level (or scope) where they were initialized. Unfortunately, even BCL classes like StreamWriter
and BinaryWriter
dispose underlying streams when disposed, so I cannot claim that this is unexpected behavior either.
IMO, there is nothing wrong with your code. Going out of your way simply to save two lines of code might make your code less readable, so your first version might easily be the simplest one.
Having said that, if you really want to wrap it in a single using
block, one idea (and I don't actually find it "better" than yours in any way) might be to wrap all your disposable resources into a single class:
public class InputFiles : IDisposable
{
private readonly Dictionary<string, Stream> _files
= new Dictionary<string, Stream>();
public Stream GetStream(string type)
{
Stream input = null;
_files.TryGetValue(type, out input);
return input;
}
public InputFiles(string[] args)
{
// this is just an idea, you probably don't
// use the extension to determine the type,
// but it still seems a bit more general than
// having strongly typed properties
foreach (var path in args)
{
var ext = Path.GetExtension(path); // pdf or tif?
_files[ext] = File.OpenRead(path);
}
}
public void Dispose()
{
foreach (var stream in _files)
stream.Value.Dispose();
}
}
And then dispose the entire object when done:
static void Main(string[] args)
{
using (var input = new InputFiles(args))
{
var pdf = input.GetStream("pdf");
var tif = input.GetStream("tif");
// use the appropriate strategy
// (parsing strategy should not be concerned with
// disposing of resources)
Process(pdf, tif);
}
}
-
\$\begingroup\$ I like this since it's a totally different approach. I had not considered moving the management of the files into a stand-alone class. I might give this a try....Also, your comment nails the core problem on the head -- the parsing strategy should not have to manage the resources. \$\endgroup\$Lucas– Lucas2012年07月12日 15:26:17 +00:00Commented Jul 12, 2012 at 15:26
-
\$\begingroup\$ I ended up using this approach and am pretty happy with it. It's nice that there's no resource management in the different strategy implementations. I did go with a strongly-typed implementation for stream lookups, i.e.
var pdf = input.GetStream(typeof(PdfArchiveFile));
\$\endgroup\$Lucas– Lucas2012年07月19日 14:26:21 +00:00Commented Jul 19, 2012 at 14:26
FileProcessor
class to parse them? And parsed items are also of the same type, since youZip
them, right? It would be helpful if you would add some background on what you are trying to do. \$\endgroup\$