Parser for tab-delimited data as a subclass of StringReader

Question 1

I wanted to parse lines of text as tab-delimited items, and I had just been using a StringReader for something else and I came up with this:

class TabDelimitedFieldReader : StringReader
 {
 private string _nextLine;
 public TabDelimitedFieldReader(string s)
 : base(s)
 {
 }
 public IEnumerable<string> ReadFields()
 {
 if (_nextLine != null)
 {
 var fields = _nextLine.Split('\t');
 foreach (string field in fields)
 {
 yield return field;
 }
 }
 }
 public bool HasMore
 {
 get
 {
 _nextLine = this.ReadLine();
 return (_nextLine != null);
 }
 }
 }

I can't think of another time I directly subclasses a .NET framework class like this. I figure maybe I'm just strange and it's normal for others, or maybe there's a reason I wouldn't want to do it.

Question 2

Not a bad idea. How do you use it, however? Say, I wanted to get text out of clipboard which got pasted in Excel. Then, I would want to make sure that row length is consistent - something that an enumerator would not give me. I would also add an optional bool flag to the ReadFields function which would cause the fields to be trimmed if true. I would run StyleCop on this, which would force you to add comments and rename _nextLine to nextLine, and refer to it as this.nextLine, etc.

Question 3

The context was an ad-hoc report loader service that aggregated data from a list of databases, which I copied/pasted from excel data formatted as a table, so even blanks will generate a blank field, and rows all the same length. Really, style cop would make me use "this" everywhere? I just changed my ambition to work at MS :)

Question 4

You can turn off the rules that you do not agree with.

Question 5

actually i decided to code the rest of the day using no underscores and "this", and I realized it has a sort of leveling effect that lets me move code in/out/around namespaces/classes/nested classes more freely, so I think I'll stick with it

Question 6

Conceptually, I do not see anything inherently wrong with subclassing a framework object, particularly since many of them provide virtual/abstract members specifically to allow it.

In this case, it may be better to wrap a TextReader object instead of inheriting from StringReader. This would allow you to provide tab delimited reading for multiple data sources instead of limiting yourself to in-memory strings. For example, you could still pass in a StringReader for reading strings, but you could also pass in a StreamReader if you wanted to support reading from files instead.

Additionally, I noticed that HasMore is dangerous to call due to its side-effect. It is not immediately obvious to a caller that checking HasMore would alter the reader's current position. I would suggest using Peek instead.

You would then need to update _nextLine elsewhere. You could have it be a side-effect of calling ReadFields. Or, perhaps better, you could add another method (e.g., AdvanceLine) that is simply responsible for moving the reader to the next line.

Question 7

Thanks for the feedback - question: are you a TDD developer? I'm not but have been lightly using/learning it for a while, and from my limited understanding, code that works is the goal and pre-empting the future is frowned upon... which goes against my instincts too, but seems to be the primary directive (as I understand it).

Question 8

By wrapping the class, you are actually reducing the test surface area, as you only need test your externally visible (public/protected) members. By extending StringReader, you would be stuck testing for cases where callers use the public base class methods to ensure you properly handle side-effects. For example, if someone called ReadLine (from TextReader), it would skip a line, changing expected behavior for either ReadFields or HasMore (or both!).

Question 9

@AaronAnodide it's called the decorator pattern. All stream in .NET work in this way. For example you can create a MemoryStream from a FileStream.

Dan Lyons Dan Lyons 3,67816 silver badges19 bronze badges · Answer 1 · 2012-02-09 17:56:59Z

Conceptually, I do not see anything inherently wrong with subclassing a framework object, particularly since many of them provide virtual/abstract members specifically to allow it.

In this case, it may be better to wrap a TextReader object instead of inheriting from StringReader. This would allow you to provide tab delimited reading for multiple data sources instead of limiting yourself to in-memory strings. For example, you could still pass in a StringReader for reading strings, but you could also pass in a StreamReader if you wanted to support reading from files instead.

Additionally, I noticed that HasMore is dangerous to call due to its side-effect. It is not immediately obvious to a caller that checking HasMore would alter the reader's current position. I would suggest using Peek instead.

You would then need to update _nextLine elsewhere. You could have it be a side-effect of calling ReadFields. Or, perhaps better, you could add another method (e.g., AdvanceLine) that is simply responsible for moving the reader to the next line.

Thanks for the feedback - question: are you a TDD developer? I'm not but have been lightly using/learning it for a while, and from my limited understanding, code that works is the goal and pre-empting the future is frowned upon... which goes against my instincts too, but seems to be the primary directive (as I understand it).
By wrapping the class, you are actually reducing the test surface area, as you only need test your externally visible (public/protected) members. By extending StringReader, you would be stuck testing for cases where callers use the public base class methods to ensure you properly handle side-effects. For example, if someone called ReadLine (from TextReader), it would skip a line, changing expected behavior for either ReadFields or HasMore (or both!).
@AaronAnodide it's called the decorator pattern. All stream in .NET work in this way. For example you can create a MemoryStream from a FileStream.

Stack Exchange Network

Parser for tab-delimited data as a subclass of StringReader

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parser for tab-delimited data as a subclass of StringReader

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions