I have a simple class that represents and accesses the data from a CSV file (using the CsvHelper library). I've tried restructuring the code to allow for better unit testing, but I'm sure the structure of the class could be better; both for testing and for using it (I'm not sure if there is any standard recommendations for these types of classes).
Here is the class code:
public class CsvCountFile
{
public string AbsolutePath { get; }
public string Delimiter { get; }
public LocationDefinition LocationData { get; private set; }
public List<CountDefinition> CountData { get; private set; }
public CsvCountFile(string absolutePath, string delimiter = ",")
{
AbsolutePath = absolutePath;
Delimiter = delimiter;
}
public void ReadCountData()
{
using (var fileReader = File.OpenText(AbsolutePath))
{
ReadCountData(fileReader);
}
}
public void ReadCountData(TextReader fileReader)
{
CountData = new List<CountDefinition>();
using (var csvReader = new CsvReader(fileReader))
{
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.RegisterClassMap<LocationMap>();
csvReader.Configuration.RegisterClassMap<CountMap>();
csvReader.Read(); // get header
csvReader.Read(); // get first record
LocationData = csvReader.GetRecord<LocationDefinition>();
csvReader.Read(); // skip blank line
csvReader.Read(); // skip second header section
while (csvReader.Read())
{
var count = csvReader.GetRecord<CountDefinition>();
CountData.Add(count);
}
}
}
}
The ReadCountData
was split into two so that the reading logic could be tested by forming a TextReader
in memory, instead of needing an actual csv file. While this code works, I don't foresee any usage scenarios that would typically require reading the csv by passing in a TextReader
.
Also, due to this structure, when CsvCountFile
is constructed, the csv absolutePath
is passed in, but testing using a TextReader
doesn't require this, and it doesn't feel right instantiating it with an empty string.
3 Answers 3
This method seems out of place:
public void ReadCountData()
As do these properties:
public string AbsolutePath { get; }
public string Delimiter { get; }
What are they for? Why would a client of the class need to know what Delimiter
has been used?
I'd take one of two approaches:
Approach one:
Continue to use this as the public contract of the class:
public void ReadCountData(TextReader reader)
The caller is then responsible for supplying the reader
(including cleaning up after it if it's a file. The CsvCountFile
class has given up responsibility for opening / closing the reader. ReadCountData()
and AbsolutePath
aren't required so would be removed.
Approach two:
Create one or more test files that are used for testing and change the interface to supply the path:
public void ReadCountData(string pathToFile)
You can then open/read and close the file as appropriate (again, ReadCountData()
and AbsolutePath
property isn't required so would be removed).
Either approach can work successfully, really it comes down to how the rest of your application hangs together and your philosophy of testing (which is quite subjective).
-
\$\begingroup\$ These approaches would just get rid of the constructor as well? Initially I had the constructor calling
ReadSporeCountData
, but I'm guessing calling a separate method is better? \$\endgroup\$Ayb4btu– Ayb4btu2016年07月25日 22:34:03 +00:00Commented Jul 25, 2016 at 22:34 -
\$\begingroup\$ @Ayb4btu depends a bit on your usage, but since I assume
LocationData
andCountData
aren't valid until after you've read from the file, I would probably make the read method private and call it directly from the class constructor. \$\endgroup\$forsvarir– forsvarir2016年07月25日 22:38:42 +00:00Commented Jul 25, 2016 at 22:38 -
\$\begingroup\$ Ok, so I then can pass the
TextReader
or path into the constructor instead. I assume I should just pick one or the other instead of having a constructor for both? I'm thinking the path is the more suitable approach, but is testing using external files a common practice, or is this frowned upon? \$\endgroup\$Ayb4btu– Ayb4btu2016年07月25日 22:50:37 +00:00Commented Jul 25, 2016 at 22:50 -
\$\begingroup\$ @Ayb4btu As I said, it's style dependant. As long as the test is repeatable and either always works or always fails, there's nothing wrong with using files for testing. Some people believe you should mock everything / never interact with external resources. However if you were going down that route, you would probably be mocking out the CsvReader itself, which you're not doing. I would put the choice down to how much effort is involved in each approach against how fast the tests will run. However, if this is in shared source, the decision often comes down to what works for the team. \$\endgroup\$forsvarir– forsvarir2016年07月25日 22:55:51 +00:00Commented Jul 25, 2016 at 22:55
Just passing by. There are few comments I want to make here.
ReadCountData()
is definitely out of place because the first method just exposes you to aStreamReader
object reference. This method can not be unit-tested. I created an interface called IFile that wraps the static OpenText() methodpublic interface IFile { StreamReader OpenText(string path); }
I created a
FileClass
that uses this interfacepublic class FileClass { private readonly IFile file; public FileClass(IFile file) { this.file = file; } // I renamed your ReadCountData as OpenFile public void OpenFile(string path) { try { var fileReader = file.OpenText(path); } catch (Exception e) { // do whatever; } } }
Using this approach, you can unit-test the OpenText() using the below code
[TestClass] public class UnitTest1 { [TestMethod] public void OpenFile_PathAsAParameter_CSVFileOpened() { //Arrange string path = @"C:\myfilerocks.txt"; var fileMock = new Moq.Mock<IFile>(); var fileClass = new FileClass(fileMock.Object); //Act fileClass.OpenFile(path); //Assert fileMock.Verify(x=> x.OpenText(path)); } }
Notice there is still a dependency on
StreamReader
andTextReader
, it's best to create a wrapper for those classes. When you have some many static class in a testable method I see it more as a smell . When a wrapper can't be created for them keep them as dependency e.gpublic void ReadCountData(TextReader fileReader)
seems alright to me. If you are looking for wrappers for the file system, visit System.IO.Abstractions by Jonathan Channon. His blog also explains a great deal Abstracting the File SystemSo the above test that the OpenText() was called.
Rather than using
csvReader
, I will useTextFieldParser
fromMicrosoft.VisualBasic.FileIO.TextFieldParser
. It works for both C# and VB. Note this example was extracted from Reading CSV Files using C#using (TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv")) { parser.TextFieldType = FieldType.Delimited; parser.SetDelimiters(","); while (!parser.EndOfData) { //Processing row string[] fields = parser.ReadFields(); foreach (string field in fields) { //TODO: Process field } } }
Note:
AbsolutePath
,Delimiter
is redundant here. Refrain from names likeReadCountData
, they don't provide information about their functionality. I would've thoughtReadDataCount
sounds better.
-
\$\begingroup\$ Based on your answer I started using
System.IO.Abstractions
, allowingReadSporeCountData
to be private, but still able to test if CsvReader is operating correctly. \$\endgroup\$Ayb4btu– Ayb4btu2016年07月26日 23:51:48 +00:00Commented Jul 26, 2016 at 23:51
When it comes to "reader" classes in C#, there's a few patterns they generally abide to. A few things they have in common. I would recommend repeating those patterns.
1 - Encapsulate Another Reader
As you've seen, CsvReader
encapsulates, depends on a TextReader
. The reader classes in the System.IO
namespace do similar. StreamReader
takes a Stream. So does BinaryReader
. StringReader
takes a string.
This dependency is injected via the constructor and represents everything the class needs to do its work (no more, no less).
So ask yourself two things:
- What is your classes responsibility? Try to think of the one thing that it is responsible for.
- What is the minimum your class needs to fulfill that responsibility?
In the case of your CsvCountFile
class, what's it trying to do? It looks to me like you're trying to extend Csv reading for a specialised case. Now, is the class representing a reader, or is it representing a file? My earlier comment asked if the AbsolutePath
property was strictly necessary. If the class is a reader, it doesn't need to know the filename, it just needs a less-specific reader to read from.
So following the reader patterns, you would:
- Expose a constructor that accepts another reader class. The choice of
TextReader
orCsvReader
is up to you. If it accepts aCsvReader
you're being more specific, which is good, but it exposes the tight-coupling your class has to it, which means the dependency for that library will leak into other parts of your system. AcceptingTextReader
means you more effectively encapsulate the inner workings, which is probably what you're wanting here. Don't expose a constructor with no arguments. The class is useless without another reader, so make that explicit. - Rename the class with a suffix of "Reader", to clearly identify the responsibility of this class.
2 - Read the Next Part vs Read to End
Both CsvReader
and other readers from the System.IO
namespace have (more or less) two sorts of read methods.
- Read the next part of the source and set a
CurrentRecord
property on the reader, returning a bool (or some other indication) to the caller to indicate a successful read. - Read to the end, or end of the next significant segment, and return the record to the caller.
In the case of CsvReader
, look at its .Read()
method for an example of the first type. It returns a bool
indicating whether a record was read, and sets the CurrentRecord
property.
Look at .ReadToEnd()
on System.IO.TextReader
for an example of the second type of read operation. It returns a "string that contains all characters from the current position to the end of the text reader".
In the case of the first type of read method, the state of the reader after the method call is important. A consumer of the class will call .Read()
, then look at the CurrentRecord
property, perform some logic, repeat.
The second type of read method doesn't depend on the state of the reader once the read is complete. A consumer will create the reader, call .ReadToEnd()
, then dispose of the reader because they've got what they need.
Your .ReadCountData()
method doesn't really fit into either of these categories. It takes a bit from both techniques by reading all the way to the end, but then storing the result on the reader itself. I would recommend going one way or the other. Either have a method that:
- Advances the reader by one full record and exposes that record as a
CurrentRecord
property, or; - Reads to the end and returns the result instead of storing it as a property on the reader.
For the first method, because you're returning two different types of records (LocationDefinition
and CountDefinition
), things get a little awkward for a single CurrentRecord
property. Without knowing more about those classes it's hard to say, but would it be reasonable to introduce a common base class for them? The readers CurrentRecord
property could be of type Definition
, which will either be an instance of LocationDefinition
or CountDefinition
, depending on the last record type read.
For the second method, you could encapsulate the two types into a single "result" class.
Whether you want one, both, or the other type of read method depends on your needs. If you want to be able to perform logic part-way through reading, then you'll want the first type of read method. For example, maybe you abandon the read operation because of some data found in the LocationDefinition
? If not, then I'd lean towards the second type of read method to keep things a little simpler.
Testing
One of the biggest challenges for testable code is dependencies. If you have code that depends on zero infrastructure, it's much easier to test. By infrastructure I mean things like the file system, database, network, etc.
Because you're relying on CsvReader
, you're stuck with its dependency (TextReader
) at a minimum. If you keep your implementation to the bare minimum though (as I suggested above on point 1), then you at least won't need any more dependencies.
Fortunately, TextReader
has an inheritor that makes testing a bit simpler. StringReader
. With StringReader
you can rely on in-memory strings for your testing, which will make it easy to write self-contained tests. Something to remain wary of though is character encoding. It's easy to dodge the issue of character encoding when you're using a StringReader
, whereas a StreamReader
, which you'll presumably expect to be using "in production" can be affected by file character encoding. It's your call as to whether this is likely to be an issue.
-
\$\begingroup\$ Your added info has been helpful. I intended the class
CsvSporeCountFile
to both read and represent a csv file on disk, rather than just being a reader. So based on the given answers it now takes in the file path at the constructor, reads the data, and holds it. Therefore it reads to the end of the data instead of one record at a time. \$\endgroup\$Ayb4btu– Ayb4btu2016年07月26日 23:47:41 +00:00Commented Jul 26, 2016 at 23:47 -
\$\begingroup\$ I would still encourage separating the reader from the result of the read. If you're wanting something to "represent" a CSV file on disk, you might consider using
System.IO.FileInfo
. The class is sealed, so you wouldn't be able to inherit from it, but you could add extension methods likebool IsCsv()
andDefinitionData ReadCsvDefinitionData()
. TheReadCsvDefinitionData()
method would create and use an instance of yourCsvCountReader
class. \$\endgroup\$Snixtor– Snixtor2016年07月27日 00:24:45 +00:00Commented Jul 27, 2016 at 0:24 -
\$\begingroup\$ Having this class as a reader instead of a file representation seems to match the common trend of these answers. So going down that route is probably the better way forward. Thanks. \$\endgroup\$Ayb4btu– Ayb4btu2016年07月27日 01:15:05 +00:00Commented Jul 27, 2016 at 1:15
-
\$\begingroup\$ It's a common trend for software architecture in general. Search for articles describing the "single responsibility principle". \$\endgroup\$Snixtor– Snixtor2016年07月27日 04:11:16 +00:00Commented Jul 27, 2016 at 4:11
TextReader
, is there any reason the class needs theAbsolutePath
variable? \$\endgroup\$