In general I feel like you are trying to do too much in one class. There are two different parsing systems you have - parsing CSV to strings rows/fields, and then parsing some of them in to numbers.
You should separate those two systems out in to discrete places - in the more complicated use-cases, you will guess wrong, and in the simpler cases, the programmer can easily implement the number parsing anyway. By conflating the two parse operations you have lost the value of the generic List<String>
on the rows, and your API is a lot more complicated than it needs to be.
This is a case-in-point, and illustrates all the bad things:
@SuppressWarnings("unchecked") public List<String> getHeaders() { return (List<String>)(List)this.headers; }
CSV is a text format - leave it that way. RFC4180 RFC4180 makes no mention of numeric data either.
You specifically ask:
How is my RFC 4180 compliance? Also, what is your opinion of how I have chosen to handle weird input? (For example, if an unpaired " is encountered, then I treat the field as if it were unquoted.)
That is specifically addressed in 4180, and is incorrect. RFC4180 has (2.6):
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
A quoted field with a line-break at the end, should extend to the next quote on the following lines. The quoted line-breaks should be treated literally.
In general I feel like you are trying to do too much in one class. There are two different parsing systems you have - parsing CSV to strings rows/fields, and then parsing some of them in to numbers.
You should separate those two systems out in to discrete places - in the more complicated use-cases, you will guess wrong, and in the simpler cases, the programmer can easily implement the number parsing anyway. By conflating the two parse operations you have lost the value of the generic List<String>
on the rows, and your API is a lot more complicated than it needs to be.
This is a case-in-point, and illustrates all the bad things:
@SuppressWarnings("unchecked") public List<String> getHeaders() { return (List<String>)(List)this.headers; }
CSV is a text format - leave it that way. RFC4180 makes no mention of numeric data either.
You specifically ask:
How is my RFC 4180 compliance? Also, what is your opinion of how I have chosen to handle weird input? (For example, if an unpaired " is encountered, then I treat the field as if it were unquoted.)
That is specifically addressed in 4180, and is incorrect. RFC4180 has (2.6):
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
A quoted field with a line-break at the end, should extend to the next quote on the following lines. The quoted line-breaks should be treated literally.
In general I feel like you are trying to do too much in one class. There are two different parsing systems you have - parsing CSV to strings rows/fields, and then parsing some of them in to numbers.
You should separate those two systems out in to discrete places - in the more complicated use-cases, you will guess wrong, and in the simpler cases, the programmer can easily implement the number parsing anyway. By conflating the two parse operations you have lost the value of the generic List<String>
on the rows, and your API is a lot more complicated than it needs to be.
This is a case-in-point, and illustrates all the bad things:
@SuppressWarnings("unchecked") public List<String> getHeaders() { return (List<String>)(List)this.headers; }
CSV is a text format - leave it that way. RFC4180 makes no mention of numeric data either.
You specifically ask:
How is my RFC 4180 compliance? Also, what is your opinion of how I have chosen to handle weird input? (For example, if an unpaired " is encountered, then I treat the field as if it were unquoted.)
That is specifically addressed in 4180, and is incorrect. RFC4180 has (2.6):
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
A quoted field with a line-break at the end, should extend to the next quote on the following lines. The quoted line-breaks should be treated literally.
In general I feel like you are trying to do too much in one class. There are two different parsing systems you have - parsing CSV to strings rows/fields, and then parsing some of them in to numbers.
You should separate those two systems out in to discrete places - in the more complicated use-cases, you will guess wrong, and in the simpler cases, the programmer can easily implement the number parsing anyway. By conflating the two parse operations you have lost the value of the generic List<String>
on the rows, and your API is a lot more complicated than it needs to be.
This is a case-in-point, and illustrates all the bad things:
@SuppressWarnings("unchecked") public List<String> getHeaders() { return (List<String>)(List)this.headers; }
CSV is a text format - leave it that way. RFC4180 makes no mention of numeric data either.
You specifically ask:
How is my RFC 4180 compliance? Also, what is your opinion of how I have chosen to handle weird input? (For example, if an unpaired " is encountered, then I treat the field as if it were unquoted.)
That is specifically addressed in 4180, and is incorrect. RFC4180 has (2.6):
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
A quoted field with a line-break at the end, should extend to the next quote on the following lines. The quoted line-breaks should be treated literally.