I recently took a test. One of the questions was to find problems in the following code and suggest refactoring which can provide adding new formats of data:
public class FileProcessor { enum FileType { Html, Text } public void ProcessFile(string fileName) { StreamReader fileStream = new StreamReader(File.OpenRead(fileName)); string fileContent = fileStream.ReadToEnd(); if (fileContent.IndexOf("<html") != -1) ProcessHtmlFile(fileContent); else ProcessTextFile(fileContent); ProcessFile(fileContent,fileContent.IndexOf("<html")!= -1? FileType.Html:FileType.Text); fileStream.Close(); } private void ProcessFile(string content, FileType fileType) { switch(fileType) { case FileType.Html: ProcessHtmlFile(content); break; case FileType.Text: ProcessTextFile(content); break; default: throw new Exception("Unknown file format"); } } private void ProcessHtmlFile(string content){...} private void ProcessTextFile(string content){...} }
What I've suggested:
I've suggested to use open closed principle and my implementation is:
public class FileProcessor
{
public void ProcessFile(string fileName, IFile file)
{
StreamReader fileStream = new StreamReader(File.OpenRead(fileName));
string fileContent = fileStream.ReadToEnd();
file.ProcessFooFile(fileContent);
fileStream.Close();
}
}
public interface IFile
{
void ProcessFooFile(string fileContent);
}
public class HtmlFile : IFile
{
public void ProcessFooFile(string fileContent)
{
//Process HTML File
}
}
public class TextFile : IFile
{
public void ProcessFooFile(string fileContent)
{
//Process Text File
}
}
My questions are:
- Is it appropriate to use open closed principle in this case?
- Is my implementation correct? Especially, please, see that I deleted a row
ProcessFile(fileContent,fileContent.IndexOf("<html")!= -1?FileType.Html:FileType.Text);
frompublic void ProcessFile(string fileName)
as there is no need in the row. - Is there some drawbacks in my implementation. Please, provide improvements.
2 Answers 2
Ad 1. Yes, it's reasonable to refactor it the way you did. It's consistent with Single Responsibility Principle, and as such it makes it easy to eg. add some new formats without having to meddle with the base routine.
Ad 2. It's questionable in some places.
a) No exception handling. You may prefer it to be caller's responsibility, but at the very least you should ensure that the file stream always gets closed, even if an exception is thrown (with a try-finally
or using
).
b) The names IFile
and its derivatives are misleading. It's a IFileProcessor
(HtmlFileProcessor
etc.). It's obvious that it doesn't represent a file itself, it represents a way of dealing with a file whose contents are passed from outside.
Ad 3. Your new implementation lacks the part that associated appropriate file processing method with a given FileType
. This could be implemented with a FileProcessorFactory
class taking a FileType
instance and returning an IFileProcessor
instance. Calling code doesn't need to know whether it's HtmlFileProcessor
etc., that's the point: such nuts and bolts are something left for the factory to worry about, and caller is only concerned whether the factory returns something that matches the interface.
EDIT:
Here's a suggested improvement.
public class FileProcessor
{
public void ProcessFile(string fileName, IFileProcessorBehavior file)
{
// using ensures we won't leave the method without closing the fileStream
// even if an exception is thrown. this prevents resource leaks.
using (var fileStream = new StreamReader(File.OpenRead(fileName)))
{
string fileContent = fileStream.ReadToEnd();
file.ProcessFooFile(fileContent);
}
}
}
// Behavior, or Strategy.
// we need to differentiate file processor the general facade
// from format-specific file processor.
// in any case, this is not a "file", it's a way of dealing with a file.
// if I knew more about the context of this code,
// I could probably come up with a more natural naming scheme
public interface IFileProcessorBehavior
{
void ProcessFooFile(string fileContent);
}
public class FileBehaviorFactory
{
public IFileProcessorBehavior Create(FileType fileType)
{
switch(fileType)
{
case FileType.Html:
return new HtmlBehavior();
case FileType.Text:
return new TextFileBehavior();
default:
// don't use the most generic Exception - use a meaningful exception type
throw new NotSupportedException("Unknown file format: " + fileType);
// ^ and quote the value that was not recognized -
// it will help to track the cause of an error
}
}
}
public class HtmlBehavior : IFileProcessorBehavior
{
public void ProcessFooFile(string fileContent)
{
// ...
}
}
public class TextFileBehavior : IFileProcessorBehavior
{
public void ProcessFooFile(string fileContent)
{
// ...
}
}
And now you would just:
var factory = new FileBehaviorFactory();
IFileProcessorBehavior behavior = factory.Create(fileType);
// at this point we don't know and we don't care of what type this behavior thing is
var processor = new FileProcessor("some file", behavior);
Thanks to this pattern any new formats can be added as separate classes (supporting the interface), so the system is open for extension, and they can be registered within the factory. The base routine (FileProcessor
) is closed, which is what we want. The factory only carries one responsibility, which is to match any given file type to respective behavior, or file processing strategy. This means there is only one ever one reason to modify the factory: when this logic needs to be enhanced, or modified.
-
\$\begingroup\$ please, be very kind to provide an example with
FileType
andFileProcessorFactory
. It is really interesting to see for me! Cannot understand what you mean. Upvoted! \$\endgroup\$StepUp– StepUp2016年04月05日 09:58:15 +00:00Commented Apr 5, 2016 at 9:58 -
1\$\begingroup\$ @StepUp an example included \$\endgroup\$Konrad Morawski– Konrad Morawski2016年04月05日 10:15:07 +00:00Commented Apr 5, 2016 at 10:15
-
\$\begingroup\$ is it okay that I should change method
public IFileProcessorBehavior Create(FileType fileType)
every time if I add new type of File? Does not it violateopen closed principle
? \$\endgroup\$StepUp– StepUp2016年04月05日 10:51:01 +00:00Commented Apr 5, 2016 at 10:51 -
\$\begingroup\$ Mildly. We've minimized the damage in compare to the original implementation. See programmers.stackexchange.com/a/302782/29029 \$\endgroup\$Konrad Morawski– Konrad Morawski2016年04月05日 12:18:34 +00:00Commented Apr 5, 2016 at 12:18
-
1\$\begingroup\$ Nitpicking a little:
using (var fileStream = new StreamReader(File.OpenRead(fileName)))
could potentially lead to an orphaned stream if theStreamReader
constructor threw an exception. It won't happen withStreamReader
but it's not a good pattern to get used to.using
should encapsulate the initial construction of the stream,using(var fileStream = File.OpenRead(fileName)) using(var streamReader = new StreamReader(fileStream)) { ... }
. Alternatively just use theStreamReader
constructor taking a pathusing(var streamReader = new StreamReader(fileName) { ... }
. \$\endgroup\$Johnbot– Johnbot2016年04月05日 12:32:53 +00:00Commented Apr 5, 2016 at 12:32
Is it eligible to use open closed principle in this case?
Yes, this is a good idea.
Is my implementation correct? Especially, please, see that I deleted a row ProcessFile(fileContent,fileContent.IndexOf(" fileName) as there is no need in the row.
There is no need for this check in ProcessFile
, but you have to move the validation logic inside each ProcessFooFile
method, because you must somehow handle situations where FileProcessor
's contructor arguments won't match (like passing an HtmlFile
and a path to a text file).
Is there some drawbacks in my implementation. Please, provide improvements.
One thing I might change is names of the interface and concrete classes. Since we are dealing with a file processing entity, I'd give them more descriptive names, e.g. IFileProcessor
, TextFileProcessor
and HtmlFileProcessor
.
IFile
class passed toProcessFile(string fileName, IFile file)
as a parameter, so the caller has to know in advance the type of data to be processed. You didn't extend the format recognition ability—you just abandoned it (for someone else?) to handle it elsewhere. PS. Possibly the downvoter thinks alike, although I don't consider that as a reason to downvote. \$\endgroup\$ProcessFile(string content, FileType fileType)
, then I would need always MODIFY methodProcessFile(string content, FileType fileType)
. Like answer by Konrad Morawski. Is it okay? \$\endgroup\$swtich(fileType)
error trapping to throw an exception. This is rare in my experience. I like deliberate, explicitenum
handling, To codify this idea defineundefined = 0
("unknown", "none", etc) as theenum
s default; a proper value must be explicitly set. Then useNotImplementedException
instead of generalException
in theswitch() default:
\$\endgroup\$