I've implemented a small program in JavaScript to count the number of lines of a string of java source code.
This was done for one of the Code Kata exercise: http://codekata.com/kata/kata13-counting-code-lines/
Overall, the goal of this exercise is summarised as follow:
The mixture of line-based things (single line comments, blank lines, and so on) with the stream-based block comments can make solutions slightly ugly. While coding your solution, consider the structure of your code, and see how well it fits the structure of the problem. As with most of these kata, consider coding multiple alternative implementations. Does what you learned on the first tries affect your approach to subsequent ones?
To be honest, I've submitted this for an interview and was told that this "lacked fundamentals by allowing errors in the code, there were issues in code construct and there were issues in choice of techniques".
Here's my simple app's source (also available in CodePen):
/**
* Return the number of lines of code of a Java source code.
* {@link http://codekata.com/kata/kata13-counting-code-lines/}
* @param {string} Java source code
* @return {number}
*/
function getJavaLineCount(source) {
var count = 0;
var patterns = {
spaces: /^\s+/,
singleLineComment: /^\/\/.*/,
multiLineComment: /^\/\*((?!\*\/)[\s\S])*\*\//,
code: /^.*\s*/
};
var readToken = function () {
var m, type;
if (m = source.match(patterns.spaces)) {
type = 'spaces';
} else if (m = source.match(patterns.singleLineComment)) {
type = 'singleLineComment';
} else if (m = source.match(patterns.multiLineComment)) {
type = 'multiLineComment';
} else if (m = source.match(patterns.code)) {
type = 'code';
} else {
throw new Error("Unknown code pattern!");
}
var token = m[0];
source = source.substr(token.length);
//DEBUG
console.log('Found token (%s): %s', type, JSON.stringify(token));
return type;
};
while (source) {
if (readToken() === 'code') {
count++;
}
console.log('new source = |%s|', source);
}
//DEBUG
console.log('LOC = ', count);
return count;
}
$(function () {
var calculateLOC = function() {
$('#linesOfCode').text(getJavaLineCount(
$('#source').val()
));
};
$('#source').change(calculateLOC);
calculateLOC();
});
textarea {
width: 50%;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id="source" rows="15">
/*****
* This is a test program with 5 lines of code
* \/* no nesting allowed!
//*****//***/// Slightly pathological comment ending...
public class Hello {
@cool
private String test = "some /* commented */ 'and' \" asd other\" " +
"string";
public static final void main(String [] args) { // gotta love Java
// Say hello
System./*wait*/out./*for*/println/*it*/("Hello/*");
}
}
</textarea>
<p>
<span id="linesOfCode">n/a</span> lines of java code.
</p>
To improve this, I could imagine writing a simple character parser to avoid using regexp, that's assuming that we consider regexps to be too slow in JS.
I know that if you provide it broken java code, it'll not work properly - but I assumed that this case would not be considered a normal scenario in this exercise.
Apart from this, I'm not sure what other kind of errors it might have.
Any suggestions?
1 Answer 1
Sorry, don't really have the time to do a comprehensive review (maybe later I'll edit!) but for now:
Your code actually fails on some cases.
The way that you're finding whether a line counts as a LOC or not is too greedy. (Alternatively, your comment regexes are too strict.) You're reading in basically the whole line as a line of code. This fails for relatively simple cases:
a/*
b*/
should be one line of code, but you treat it as two, since your code
regex absorbs the /*
. In fact, I'd argue that the way that you choose to count lines of code is odd; it took me a few reads to see what you were doing and find that error. The more natural way perhaps would be to strip spacing and comments, then count non-empty lines.
That also has the advantage of being way faster: you're calling String#substr()
a lot, which is pretty expensive.
Aside from that, I only have a few comments:
- Keep spacing consistent. Choose either
function()
orfunction ()
; don't use both. - Be consistent in your choice of indentation character. This is easier if you enable showing whitespace in your editor.
- Be consistent in string delimiters; don't switch between
'
and"
without a good reason. - Since you're using a bare object, you can iterate over it with a
for..in
. If you're really concerned that somebody will put properties into the prototype of Object (which, by the way, should NEVER EVER happen), then add thehasOwnProperty
check:
for (var tokenType in patterns) {
// if (patterns.hasOwnProperty(tokenType)) ...
if (m = source.match(patterns[tokenType])) {
var token = m[0];
source = source.substr(token.length);
//DEBUG
console.log('Found token (%s): %s', type, JSON.stringify(token));
return type;
}
}
throw new Error('Unknown code pattern!');
-
\$\begingroup\$ Thanks a lot for the pointers. I've made a new version that I'll document here later: codepen.io/kayhadrin/pen/MagYOJ \$\endgroup\$Kayhadrin– Kayhadrin2015年08月26日 18:46:29 +00:00Commented Aug 26, 2015 at 18:46
-
\$\begingroup\$ @Kayhadrin Alright! Make sure to read this thread on how to post follow-up questions. :-) \$\endgroup\$Schism– Schism2015年08月27日 02:32:08 +00:00Commented Aug 27, 2015 at 2:32
-
\$\begingroup\$ Thanks for the info. I'm not going to open a new question for it though since I think it's ok for now. \$\endgroup\$Kayhadrin– Kayhadrin2015年08月27日 13:46:10 +00:00Commented Aug 27, 2015 at 13:46
-
\$\begingroup\$ For the new version, I've addressed the issue for a java code that could be like
a/*\nb*/
. I've also used afor(...in)
as suggested. \$\endgroup\$Kayhadrin– Kayhadrin2015年08月27日 13:54:02 +00:00Commented Aug 27, 2015 at 13:54 -
1\$\begingroup\$ The rest of the coding style issues like tabs and spaces, etc... are purely due to me coding in CodePen - not a real text editor by any account. ;-) \$\endgroup\$Kayhadrin– Kayhadrin2015年08月27日 13:57:43 +00:00Commented Aug 27, 2015 at 13:57
Explore related questions
See similar questions with these tags.