Parse a CSV file and return an object or array

Question 1

I recently published a JavaScript module for NPM which is also available on Github by the name of rawiki-parse-csv. The code takes raw CSV data and returns an array. The method has one option to include the first line as a header in which case an object is returned within the array. If the option is not set each line will be returned as an array.

The project includes a Chai test which inputs some sample CSV data and checks the returned output with both options specified. The tests pass without issue and use in an actual application appears to be faultless.

Could someone comment on any issues with this code, potentials for failure, incorrect structure or areas for improvement?

/* 
The MIT License (MIT)
Copyright (c) 2016 Rawikitua Isherwood
Parse module creates an object or set of arrays from a csv file by splitting the text at each new 
line and splitting each line at comma marks.*/
var parse = function (input, option){
 try
 { 
 // output object
 var data = {},
 // output no columns array
 container = [],
 // output array
 records = [],
 // splits csv data at each new line
 lines =input.split(/\r\n|\r|\n/),
 // creates columns by splitting first line of csv
 columns = lines[0].split(',');
 // creates objects from csv data if column option included
 if (option === true){
 // loop through each line of csv file
 for (var l = 1; l <= lines.length-1; l++)
 {
 // splits each line of csv by comma
 var words = lines[l].split(',');
 // builds object based on column headers
 for (var cell in columns)
 {
 data[columns[cell]] = words[cell]; 
 }
 // pushes object to output array
 records.push(data);
 // resets object in order to add another
 data = {};
 }
 }
 else {
 // creates nested arrays from csv data if column option omitted,false or not true
 for (var l = 0; l <= lines.length-1; l++)
 {
 // splits each line of csv by comma
 var words = lines[l].split(',');
 // creates objects from csv data
 for (var cell in words)
 {
 container.push(words[cell]); 
 }
 // push array to output array
 records.push(container);
 // reset array in order to add another
 container = [];
 }
 }
 // returns output array
 return records
 }
 catch(err){
 return err
 }
}
module.exports = parse

Question 2

Thank you 200_success for the detailed review. I've updated the readme.md with a disclaimer as you've suggested. I was in such a hurry to publish something for NPM that I hadn't actually studied the CSV specs so omitting functionality that will parse a string with special characters would definitely make it unsuitable for production. I've since realised that including that functionality in the code is no simple task and requires a full rewrite which I'm still trying to cleanly. I've seen a few regex solutions but I'll hopefully come up with something of my own when I have time.

Question 3

The line l <= lines.length - 1 was used because the length property counts from 1 and arrays start at zero and I would otherwise need to minus one from l when it's used as a key in my arrays. As for writing this using two separate functions I had thought this would create more duplicate code so I tried to duplicate only that which was necessary to achieve the desired functionality. Would splitting this into multiple functions still be best practice? Otherwise I'm going to do a complete rewrite when I can to implement the rest of the points you've given.

Question 4

Concept

CSV is a somewhat loosely defined concept. The closest thing there is to a specification is RFC 4180.

This parser is rather naïve: it doesn't handle quoting, and as a result it cannot return data that contain strings with literal commas. Maybe that is good enough for your own use, but in my opinion it's not good enough for a publicly published library, at least not without a giant disclaimer.

For an example of a good CSV library that handles multiple dialects, take a look at Python's built-in library.

Implementation

The purpose of the option is not self-evident. A better name might be headerRow. Also, the way you have nearly duplicated the two cases, you might as well write it as two functions instead.

The code crashes on empty input, instead of returning an empty data structure as I would expect.

Why do you catch an exception and convert it to a return value? That defeats the purpose of the exception mechanism, and forces the caller to handle the possibility of very weird "data" resulting from the parsing.

I'm not a fan of var declarations that span multiple lines, since it is easy to unintentionally write something different if you screw up the punctuation. ("use strict" would be a good practice for helping to detect such errors.) Your use is particularly bad because the intervening comment lines and the lack of additional indentation on the subsequent lines make it hard to read correctly.

l <= lines.length - 1 is not idiomatic: the standard way to write a counting for-loop in JavaScript is:

for (var i = 0; i < array.length; i++)

Why do you "reset" your temporary variables at the end of the loop for reuse in the next iteration? Why not just create it at the top of the loop?

Commenting every line of code is annoying. Most of your comments can just be eliminated.

Use semicolons consistently. You missed one at return records.

Question 5

Actually, there's more than one missing semicolon. (Or too many semicolons; depending on style.)

Question 6

Besides the answer of @200_success, I have one further suggestion for improvement: I recommend making the record separator (currently comma) and the line separator (currently new-line) optional parameters, instead of hard-coding them. (If not provided, the default values could still be comma and new-line.) Of course, the separator parameters could be arrays of strings, so that you can provide e.g. more types of new-line.

Question 7

Yes the majority of csv parsing modules allow custom delimiters which I wasn't aware of until now. This would require a full rewrite to include not only that but the ability to parse special characters but yes I will include those when I rewrite this.

Question 8

As well as what others have said:

Strict mode. Make debugging ~~(削除) great again (削除ここまで)~~ not so stressful: strict mode throws more errors instead of silently failing or leaking global variables. (It also prohibits with.)
Please don't use a for...in loop to iterate over arrays. Either use the ES6 for...of loop, or use Array.prototype.forEach(). for...in loops are for object properties not array items, so can break if there are unexpected properties.
Code style: either use semicolons or don't. If you do, be consistent; if you don't, make sure you know the rules. And opening braces always go on the end of the line, not on a new line; "cuddle" else and catch so they're on the same line as the previous closing brace.
ES6 let and const may help you with scoping: vars go to the top of the function, not the block.

Question 9

I wasn't aware that for..in could give issues iterating over arrays so that is interesting. Yes I probably should use strict mode but I've had issues using this with it that I found it easier to ignore it. Is there a reason that braces should should start on the end of a line? I'm using the Allman indent style which I used writing C# and is similar to a begin end block in other languages and I feel makes the code a lot more readable.

Question 10

@Rawiki Do you mean you're doing things like (function () { console.log(this) }()), which returns the window object?

score 5 · Accepted Answer · 2016-06-05 16:19:35Z

Concept

CSV is a somewhat loosely defined concept. The closest thing there is to a specification is RFC 4180.

This parser is rather naïve: it doesn't handle quoting, and as a result it cannot return data that contain strings with literal commas. Maybe that is good enough for your own use, but in my opinion it's not good enough for a publicly published library, at least not without a giant disclaimer.

For an example of a good CSV library that handles multiple dialects, take a look at Python's built-in library.

Implementation

The purpose of the option is not self-evident. A better name might be headerRow. Also, the way you have nearly duplicated the two cases, you might as well write it as two functions instead.

The code crashes on empty input, instead of returning an empty data structure as I would expect.

Why do you catch an exception and convert it to a return value? That defeats the purpose of the exception mechanism, and forces the caller to handle the possibility of very weird "data" resulting from the parsing.

I'm not a fan of var declarations that span multiple lines, since it is easy to unintentionally write something different if you screw up the punctuation. ("use strict" would be a good practice for helping to detect such errors.) Your use is particularly bad because the intervening comment lines and the lack of additional indentation on the subsequent lines make it hard to read correctly.

l <= lines.length - 1 is not idiomatic: the standard way to write a counting for-loop in JavaScript is:

for (var i = 0; i < array.length; i++)

Why do you "reset" your temporary variables at the end of the loop for reuse in the next iteration? Why not just create it at the top of the loop?

Commenting every line of code is annoying. Most of your comments can just be eliminated.

Use semicolons consistently. You missed one at return records.

Actually, there's more than one missing semicolon. (Or too many semicolons; depending on style.)

Stack Exchange Network

Parse a CSV file and return an object or array

3 Answers 3

Concept

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parse a CSV file and return an object or array

3 Answers 3

Concept

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions