I recently published a JavaScript module for NPM which is also available on Github by the name of rawiki-parse-csv. The code takes raw CSV data and returns an array. The method has one option to include the first line as a header in which case an object is returned within the array. If the option is not set each line will be returned as an array.
The project includes a Chai test which inputs some sample CSV data and checks the returned output with both options specified. The tests pass without issue and use in an actual application appears to be faultless.
Could someone comment on any issues with this code, potentials for failure, incorrect structure or areas for improvement?
/*
The MIT License (MIT)
Copyright (c) 2016 Rawikitua Isherwood
Parse module creates an object or set of arrays from a csv file by splitting the text at each new
line and splitting each line at comma marks.*/
var parse = function (input, option){
try
{
// output object
var data = {},
// output no columns array
container = [],
// output array
records = [],
// splits csv data at each new line
lines =input.split(/\r\n|\r|\n/),
// creates columns by splitting first line of csv
columns = lines[0].split(',');
// creates objects from csv data if column option included
if (option === true){
// loop through each line of csv file
for (var l = 1; l <= lines.length-1; l++)
{
// splits each line of csv by comma
var words = lines[l].split(',');
// builds object based on column headers
for (var cell in columns)
{
data[columns[cell]] = words[cell];
}
// pushes object to output array
records.push(data);
// resets object in order to add another
data = {};
}
}
else {
// creates nested arrays from csv data if column option omitted,false or not true
for (var l = 0; l <= lines.length-1; l++)
{
// splits each line of csv by comma
var words = lines[l].split(',');
// creates objects from csv data
for (var cell in words)
{
container.push(words[cell]);
}
// push array to output array
records.push(container);
// reset array in order to add another
container = [];
}
}
// returns output array
return records
}
catch(err){
return err
}
}
module.exports = parse
3 Answers 3
Concept
CSV is a somewhat loosely defined concept. The closest thing there is to a specification is RFC 4180.
This parser is rather naïve: it doesn't handle quoting, and as a result it cannot return data that contain strings with literal commas. Maybe that is good enough for your own use, but in my opinion it's not good enough for a publicly published library, at least not without a giant disclaimer.
For an example of a good CSV library that handles multiple dialects, take a look at Python's built-in library.
Implementation
The purpose of the option
is not self-evident. A better name might be headerRow
. Also, the way you have nearly duplicated the two cases, you might as well write it as two functions instead.
The code crashes on empty input, instead of returning an empty data structure as I would expect.
Why do you catch an exception and convert it to a return value? That defeats the purpose of the exception mechanism, and forces the caller to handle the possibility of very weird "data" resulting from the parsing.
I'm not a fan of var
declarations that span multiple lines, since it is easy to unintentionally write something different if you screw up the punctuation. ("use strict"
would be a good practice for helping to detect such errors.) Your use is particularly bad because the intervening comment lines and the lack of additional indentation on the subsequent lines make it hard to read correctly.
l <= lines.length - 1
is not idiomatic: the standard way to write a counting for-loop in JavaScript is:
for (var i = 0; i < array.length; i++)
Why do you "reset" your temporary variables at the end of the loop for reuse in the next iteration? Why not just create it at the top of the loop?
Commenting every line of code is annoying. Most of your comments can just be eliminated.
Use semicolons consistently. You missed one at return records
.
-
\$\begingroup\$ Actually, there's more than one missing semicolon. (Or too many semicolons; depending on style.) \$\endgroup\$gcampbell– gcampbell2016年06月05日 20:38:39 +00:00Commented Jun 5, 2016 at 20:38
Besides the answer of @200_success, I have one further suggestion for improvement: I recommend making the record separator (currently comma) and the line separator (currently new-line) optional parameters, instead of hard-coding them. (If not provided, the default values could still be comma and new-line.) Of course, the separator parameters could be arrays of strings, so that you can provide e.g. more types of new-line.
-
\$\begingroup\$ Yes the majority of csv parsing modules allow custom delimiters which I wasn't aware of until now. This would require a full rewrite to include not only that but the ability to parse special characters but yes I will include those when I rewrite this. \$\endgroup\$Rawiki– Rawiki2016年06月08日 08:41:45 +00:00Commented Jun 8, 2016 at 8:41
As well as what others have said:
- Strict mode. Make debugging
(削除) great again (削除ここまで)not so stressful: strict mode throws more errors instead of silently failing or leaking global variables. (It also prohibitswith
.) - Please don't use a for...in loop to iterate over arrays. Either use the ES6 for...of loop, or use
Array.prototype.forEach()
. for...in loops are for object properties not array items, so can break if there are unexpected properties. - Code style: either use semicolons or don't. If you do, be consistent; if you don't, make sure you know the rules. And opening braces always go on the end of the line, not on a new line; "cuddle"
else
andcatch
so they're on the same line as the previous closing brace. - ES6
let
andconst
may help you with scoping:var
s go to the top of the function, not the block.
-
\$\begingroup\$ I wasn't aware that for..in could give issues iterating over arrays so that is interesting. Yes I probably should use strict mode but I've had issues using
this
with it that I found it easier to ignore it. Is there a reason that braces should should start on the end of a line? I'm using the Allman indent style which I used writing C# and is similar to a begin end block in other languages and I feel makes the code a lot more readable. \$\endgroup\$Rawiki– Rawiki2016年06月08日 08:51:39 +00:00Commented Jun 8, 2016 at 8:51 -
\$\begingroup\$ @Rawiki Do you mean you're doing things like
(function () { console.log(this) }())
, which returns the window object? \$\endgroup\$gcampbell– gcampbell2016年06月08日 09:40:05 +00:00Commented Jun 8, 2016 at 9:40
l <= lines.length - 1
was used because the length property counts from 1 and arrays start at zero and I would otherwise need to minus one froml
when it's used as a key in my arrays. As for writing this using two separate functions I had thought this would create more duplicate code so I tried to duplicate only that which was necessary to achieve the desired functionality. Would splitting this into multiple functions still be best practice? Otherwise I'm going to do a complete rewrite when I can to implement the rest of the points you've given. \$\endgroup\$