6
\$\begingroup\$

After programming a lot in high level languages such as Python and R, I started working with C++ recently. To get my C++ skills up, I of course read a lot of books. In addition, I try and replicate some functionality from high level languages in C++. My first attempt at this (after Hello World :)) was to create a C++ class which could read comma separated files. In essence the class parses the file and reads it into a boost::MultiArray. I would appreciate any feedback on the code.

The links to the code and example data are listed below, they link to my bitbucket account.

The actual code comprises of a cpp file (csv-reader.cpp):

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>
#include <boost/multi_array.hpp>
#include "csv-test.h"
#include <cassert>
template <class T> class csv_reader {
 boost::multi_array<T, 2> content2d ;
 std::vector<std::string> split_line ; 
 std::string line;
 std::string sep ;
 int ncol ; 
 int nrow ; 
public :
 csv_reader(std::string, std::string) ; // constructor
 ~csv_reader(); // desctructor
 void cout_content() ; // print the contents
 T operator() (unsigned row, unsigned column) ;
} ;
// Constructor
template <class T> csv_reader<T>::csv_reader(std::string path, std::string sep = ",")
{
 // Initializing variables
 ncol = 0 ; // Initialize the number of colums to 0
 nrow = 1 ; // Initialize the number of rows to 1
 content2d = boost::multi_array<T, 2> (boost::extents[0][0]) ;
 std::ifstream data(path.c_str()) ;
 // read the csv data
 while(getline(data, line))
 { 
 boost::split(split_line, line, boost::is_any_of(sep) ) ;
 if(ncol == 0) 
 {
 ncol = split_line.size() ;
 }
 else assert(ncol == split_line.size()) ;
 content2d.resize(boost::extents[nrow][ncol]) ;
 for(int i = 0; i < split_line.size(); i++) 
 {
 content2d[nrow - 1][i] = convertToDouble(split_line[i]) ;
 }
 nrow++ ;
 }
}
// Destructor
template <class T> csv_reader<T>::~csv_reader() { }
template <class T> void csv_reader<T>::cout_content()
{
 for(int row = 0; row < (nrow - 1); row++) 
 {
 for(int col = 0; col < ncol ; col++) 
 {
 std::cout << content2d[row][col] << " ";
 }
 std::cout << "\n" ;
 }
}
// Allow access to the contents
template <class T> T csv_reader<T>::operator() (unsigned row, unsigned column)
{ 
 if (row >= nrow || column >= ncol)
 throw BadIndex("boost::MultiArray subscript out of bounds");
 return(content2d[row][column]) ;
}
int main()
{
 // An integer csv reader
 csv_reader<int> csv_obj_int("test.csv") ;
 csv_obj_int.cout_content() ;
 // A double csv reader
 csv_reader<double> csv_obj_double("test.csv") ;
 csv_obj_double.cout_content() ;
 // It also supports direct access to the content using operator()
 std::cout << csv_obj_double(1,1) << "\n" ;
 std::cout << csv_obj_double(1,1) * 5 << "\n" ;
 // This statement fails with a subscript out of bounds error
 // std::cout << csv_obj_double(10,10) * 5 << "\n" ;
 // Testing a different seperator
 csv_reader<double> csv_obj_double_sep2("test_semicol.csv", ";") ;
 csv_obj_double_sep2.cout_content() ;
}

and a header file (csv-test.h):

 // File: convert.h
 #include <iostream>
 #include <sstream>
 #include <string>
 #include <stdexcept>
 class BadConversion : public std::runtime_error {
 public:
 BadConversion(std::string const& s)
 : std::runtime_error(s)
 { }
 };
 class BadIndex : public std::runtime_error {
 public:
 BadIndex(std::string const& s)
 : std::runtime_error(s)
 { }
 };
 inline double convertToDouble(std::string const& s)
 {
 std::istringstream i(s);
 double x;
 if (!(i >> x))
 throw BadConversion("convertToDouble(\"" + s + "\")");
 return x;
 }

The main() test the class on a number of different csv file examples: test.csv and test_semicol.csv.

asked Mar 18, 2012 at 19:20
\$\endgroup\$

3 Answers 3

5
\$\begingroup\$

First of all, boost::split() is just a strtok() so it won't work with REAL csv files. CSV parsing is huge topic and it is better avoid this discussion here.

Second, convertToDouble() is not from C++, seems to be a visitor from C#. Even if I am wrong, it is design problem: template is defined for arbitrary class, but "silently" uses double for operation. It is necessary to create proper type casting, or use template specialization.

Strange usage of assert(). It does something in DEBUG builds only, in release build it is empty. If you need debug aids, just put assert() before content2d.resize(boost::extents[nrow][ncol]) ; and add any if that you need to handle the size problem for release builds. Consider throw an exception.

Destructor. You may drop both declaration and implementation of dtor and let the compiler create one for you, but I have a concern.

I do not know, where boost::multi_array allocates its memory, but if it uses stack, you may have troubles with big arrays. Because of this, it may be better to allocate content2d in heap (content2d = new boost::multi_array<T, 2>; and put delete content2d into destructor.

Constructor. It is better to use initializators list, so it may looks like (I agree with Loki about streams):

csv_reader(std::istream& str, std::string const& sep) : content2d( new boost::multi_array<T, 2> (boost::extents[0][0]))

I assume boost::multi_array<T, 2>* content2d declaration.

You do not need ncol and nrow variables, besides local variables in ctor: array contains information about its dimensions, and if you add separate variables for that, you just add another point of failure in your code.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Mar 19, 2012 at 13:58
\$\endgroup\$
4
\$\begingroup\$

Header Files

Should always be protected by include guards.
Otherwise you are sustainable to infinite recursion and multiple definitions.

// Header.h
#ifndef XXX
#define XXX
/// The header file goes here
#endif

XXX Should be a unique identifier. A lot of build systems will generate a GUID for you. If you are doing your own then make it unique by including the namespace and file name.

While we are talking about it it is worth putting all your stuff into a namespace to avoid clashes with other people code.

Source File

There is some stuff here the should be moved to the header file.

template <class T> class csv_reader {

If you want to re-use this should also be in a header file (note because it is a template you also need to put the definitions of the methods in the header file).

Looking at the class you seem to be keeping too man member variables:

template <class T> class csv_reader {
 boost::multi_array<T, 2> content2d ;
 std::vector<std::string> split_line ; 
 std::string line;
 std::string sep ;
 int ncol ; 
 int nrow ; 
public :
 csv_reader(std::string, std::string) ; // constructor
 ~csv_reader(); // desctructor
 void cout_content() ; // print the contents
 T operator() (unsigned row, unsigned column) ;
} ;

The only member variable you need is content2d. The other variables are really function local variables you do not need to store them as part of the class.

The constructor:

csv_reader(std::string, std::string) ; // constructor

You are passing the name of a file.
This is very limiting. It would be more traditional to pass the file as a stream. Then you can read from a file/string/std::cin etc.

You are passing things by value. This can introduce unneeded copying of the values so pass by const reference.

csv_reader(std::istream& str, std::string const& sep) ; // constructor

If you're destructor does not do anything then do not define one.

// Destructor
template <class T> csv_reader<T>::~csv_reader() { }

Best to just use the compiler generated version (which does the above).

Printing the content: You are using the method void cout_content().

Explicitly encoding the stream into the name is a bad idea it makes it less flexable. Pass the stream as a parameter you can default the stream to std::cout if you want.

void printContent(std::ostream& stream = std::cout); // print the contents

I would then also add an output operator operator<< that calls this method.

template <class T>
std::ostream& operator<<(std::ostream& stream, csv_reader<T> const& data)
{
 data.printContent(stream);
 return stream;
}

Indexing into data: you use operator() which is fine. But with a tiny amount of work you can use [] see https://stackoverflow.com/a/1971207/14065

answered Mar 19, 2012 at 3:09
\$\endgroup\$
2
\$\begingroup\$

I would use boost::lexical_cast for type conversions. Also I consider that you shouldn't limit class to parse csv values of the single type, so I guess template parameters should specify column types as well, so maybe your multi array should keep something like boost::variant

answered Jun 15, 2014 at 16:18
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.