This is my first real attempt at a Scala program. I come from a predominantly Java background, so I'd like to know if the program sticks to Scala conventions well.
Is it well readable or should it be formulated differently? To me, there is a lot of lines in the main function which doesn't quite bode right.
It is functioning and I can provide my JUnit tests to prove it. Not all tests pass as those dealing with whitespace are now different.
I didn't want to use any parsing library or support as that doesn't help me learn the core of the language.
package com.wesleyacheson
import scala.io.Source
import scala.annotation.tailrec
class CSVReader2(source: Source) {
val QuoteChar = '"'
val LF = '\n'
val CR = '\r'
val Separator = ','
def readAll(lines:List[List[String]] = Nil):List[List[String]] = {
if (source.hasNext) {
return readAll(readLine()::lines)
}
return lines.reverse;
}
@tailrec final def readUntilQuote(buffered: AnyRef with BufferedIterator[Char], partial:String = ""):String = {
val char = buffered.next()
val next = if(buffered.hasNext) buffered.head else ""
(char, next) match {
case (QuoteChar, QuoteChar) => readUntilQuote(buffered, partial + buffered.next()) //We've read both charachters so skip
case (QuoteChar, _) => partial
case _ => readUntilQuote(buffered, partial + char)
}
}
def readLine():List[String] = {
val buffered = source.buffered
@tailrec def readLine(tokens:List[String] , partialToken:String):List[String] = {
val finished = {() => (partialToken::tokens).reverse}
if (!buffered.hasNext) return finished()
val char=buffered.next()
val subsequentChar = if (buffered.hasNext) buffered.head
(char, subsequentChar) match {
case (QuoteChar, x) if x != QuoteChar =>
readLine(tokens, partialToken + readUntilQuote(buffered))
case (QuoteChar, QuoteChar) =>
buffered.next();
readLine(tokens, partialToken + QuoteChar)
case (Separator, _) =>
readLine(partialToken::tokens, "")
case (CR, LF) =>
buffered.next(); finished()
case (CR, _) =>
finished()
case (LF, _) =>
finished()
case _ =>
readLine(tokens, partialToken + char)
}
}
readLine(Nil, "");
}
}
If interested, the unit tests are here:
package com.wesleyacheson
import org.junit._
import Assert._
import scala.io.Source
class CsvReader2Test {
val simpleSource = Source.fromString("""abc,def,ghi
jkl, zzz""");
val quotedFields = Source.fromString("foo bar, \"foo bar\"");
@Test def VerifyReaderPassedMustNotBeNull() {
val source: Source = null;
try {
new CSVReader(source)
fail("Should have thrown exception")
} catch {
case e: IllegalArgumentException => //Expected
}
}
@Test def VerifyReadAllReturnsAStringList() {
assertTrue("Expected a List[String]", new CSVReader2(simpleSource).readAll().isInstanceOf[List[String]]);
}
@Test def VerifyNumberOfReadLinesAreCorrect() {
assertEquals(2, new CSVReader2(simpleSource).readAll().size)
}
@Test def VerifyFirstLineContainsExpectedValues() {
val firstLine = new CSVReader2(simpleSource).readAll().head
println(firstLine)
assertEquals("Checking the number of tokens in the first line", 3, firstLine.size)
assertTrue("Checking that the line contains " + "abc", firstLine.contains("abc"))
assertTrue("Checking that the line contains " + "def", firstLine.contains("def"))
assertTrue("Checking that the line contains " + "ghi", firstLine.contains("ghi"))
assertFalse("Checking that the line does not contain" + "jkl", firstLine.contains("jkl"))
}
@Test def verifyBothQuotedFieldsAreTheSame() {
val line = new CSVReader2(quotedFields).readAll().head;
assertEquals("Checking that the number of tokens in the first line", 2, line.size);
assertEquals("foo bar", line(0));
assertEquals("foo bar", line(1));
}
@Test def verifySimpleQuotedValueIsUnchanged {
println(new CSVReader2(Source.fromString("\"foo bar\",")).readLine()(0))
assertEquals("foo bar", new CSVReader2(Source.fromString("\"foo bar\",")).readLine()(0))
}
@Test def verifyDoubleQuotesAreConvertedToQuote {
assertEquals("\"foo bar\"", new CSVReader2(Source.fromString("\"\"foo bar\"\",")).readLine().head)
}
@Test def verify3QuotesAreTreatedAsDoubleQuotesWithinSection {
assertEquals("\"foo bar\"", new CSVReader2(Source.fromString("\"\"\"foo bar\"\"\",")).readLine().head)
}
@Test def verifyQuotedCommasAreReturned {
assertEquals(",", new CSVReader2(Source.fromString("\",\",")).readLine().head)
}
@Test def verifyLeadingWhiteSpaceIsRemoved {
assertEquals("abc def", new CSVReader2(Source.fromString(" abc def")).readLine().head)
}
@Test def verifyTailingWhiteSpaceIsRemoved {
assertEquals("1", new CSVReader2(Source.fromString("1 ")).readLine().head)
}
@Test def verifyQuotedLeadingWhitespaceIsPreserved {
assertEquals(" 1", new CSVReader2(Source.fromString("\" 1\"")).readLine().head)
}
@Test def verifyQuotedTailingWhitespaceIsPreserved {
assertEquals("1 ", new CSVReader2(Source.fromString("\"1 \"")).readLine().head)
}
@Test def verifyQuotedBlankLinesArePreserved{
assertEquals("first\n\rsecond", new CSVReader2(Source.fromString("\"first\n\rsecond\"")).readLine().head)
}
@Test def verifyCellsAreInTheRightOrder{
val returned = new CSVReader2(Source.fromString("first,second")).readLine()
assertEquals("first", returned(0));
assertEquals("second", returned(1));
}
@Test def verifyRowsAreInTheRightOrder{
val returned = new CSVReader2(Source.fromString("first\nsecond")).readAll()
assertEquals("first", returned(0)(0));
assertEquals("second", returned(1)(0));
}
@Test def verifyNewLineCarriageReturnIsOnlyTreatedAsOneBlankLine{
val returned = new CSVReader2(Source.fromString("first\n\rsecond")).readAll()
assertEquals("first", returned(0)(0));
assertEquals("second", returned(1)(0));
}
@Test def verifyReadsUntilFirstQuote {
assertEquals("abc\"defg", new CSVReader2(Source.fromString("")).readUntilQuote(Source.fromString("abc\"\"defg\"hi\"jklmnop").buffered ))
}
@Test def verifyThrowsExceptionIfNoQuoteFound {
try {
new CSVReader2(Source.fromString("")).readUntilQuote(Source.fromString("a").buffered)
fail()
} catch {
case e => //expected
}
}
}
What I see when I look at this coming from a Java viewpoint. String concatenation isn't usually done with +
. However I don't see how to use a string buffer and keep it semi functional.
The program isn't a functional program. I don't think this could have been avoided using a source. Pure functional programming would have meant that I'd have to make concessions like converting the entire input to a string outside and passing that in which isn't practical.
I don't like the argument-less anonymous function but I don't know what else to do for it.
val finished = {() => (partialToken::tokens).reverse}
I've been told that a lazy val
may be more appropriate for this, and I tend to agree.
It may be better for extendability if I returned a list of token objects (or is this my Java head interfering?)
I wonder if any traits could be mixed in to make it more rich. If there was a trait dealing with 2-dimension tabular data for instance.
The buffered iterator was a bit of a cheat. My original was far longer until I added that. Feels like maybe I've skipped a bit of learning for the sake of convenience.
-
\$\begingroup\$ @Jamal I can't agree with the tag edits. This may be a case of reinventing the wheel but the purpose of it is learning. The focus of the question isn't about how to prevent that. Similarly for csv tag, and the unit-testing. The question isn't about that. The purpose isn't to read a CSV. The tests were only in the question to prove the criteria of code review. The body changes are fine, but I'd like to revert the title and tags. \$\endgroup\$Athas– Athas2014年08月15日 10:44:29 +00:00Commented Aug 15, 2014 at 10:44
-
\$\begingroup\$ Then you may make these changes. \$\endgroup\$Jamal– Jamal2014年08月15日 13:40:11 +00:00Commented Aug 15, 2014 at 13:40
2 Answers 2
Here are a few things, going from smallest to the more significant.
- Try to avoid using return. It isn't idiomatic, and for good reason. So instead of writing
if (cond) return a; return b
you should preferif (cond) a else b
- Unless I'm missing anything, in
readUntilQuote
's signaturebuffered
should be of typeBufferedIterator[Char]
instead ofAnyRef with BufferedIterator[Char]
. TheAnyRef
there doesn't do anything useful. - You can use
Source
and still have functional code. Specifically, try looking atSource.getLines()
. It returns anIterator[String]
that is lazily-evaluated but can still be used functionally almost like other collections (you canmap
,fold
,filter
, etc.)
Below you can see that I've rewritten a bit of your code. For readabilities sake my first suggestion is to attempt to make your function definitions and their parameter lists consistent with the Scala Style Guide. Next, I would also move the @tailrec
annotation one line above the function for which it is intended. Lastly, I would rename your inner function such that people reading your code can quickly see that it is an inner 'Helper' function.
def readLine(): List[String] = {
// ...
@tailrec
def readLineHelper(tokens: List[String], partialToken: String): List[String] = {
// function body
}
readLineHelper(Nil, "")
}
You also mentioned that you didn't like your finished
function. It appears to me that you can improve its implementation in one of two ways. At first glance it seems that readLineHelpers
parameters partialToken
and tokens
are never altered throughout the course of the function, in other words they are immutable values. If this is the case than finished
can be declared as a reference to data:
val finished = (partialToken :: tokens) reverse
If, however, there is some state change to partialToken
or tokens
then you will need to declare finished
as you originally did, as a function value but with the following syntax:
val finished = () => (partialToken :: tokens) reverse
Explore related questions
See similar questions with these tags.