Objective:
I want to import an Excel file, and read the rows of certain columns. For this, I use ExcelDataReader
. I've implemented a low-level class called ExcelData
which uses the ExcelDataReader
and does things like figuring out if it is an ".xls" of ".xslx" file (or maybe something completely unrelated!) etc. On top of that class I made a ReadInData
class, which will get the specific columns I want from the Excel sheet.
Major concerns:
- The list of catches in my main program
- Throwing of exceptions
- The overall construction of the code and code quality
- Should I encapsulate my
ExcelData
class withinReadInData
, or keep it like it is? - Passing of the
isFirstRowAsColumnNames
parameter
Because this is code for a company, I changed the names of a couple of classes, so I know they aren't the best of names.
The entry point of my code:
class Program
{
static void Main(string[] args)
{
try
{
ReadInData readInData = new ReadInData(@"C:\SC.xlsx", "sc_2014");
IEnumerable<Recipient> recipients = readInData.GetData();
}
catch (FileNotFoundException ex)
{
Console.WriteLine(ex.Message);
}
catch (ArgumentException ex)
{
Console.WriteLine(ex.Message);
}
catch (WorksheetDoesNotExistException ex)
{
Console.WriteLine(ex.Message);
}
catch (FileToBeProcessedIsNotInTheCorrectFormatException ex)
{
Console.WriteLine(ex.Message);
}
Console.ReadLine();
}
}
In this code I create a new ReadInData
class, to which I pass the path, the name of the file and the name of the Excel sheet that I want to read.
Concern here: is it okay to pass those things within that file?
The ReadInData
class:
public class ReadInData
{
private string path;
private string worksheetName;
public ReadInData(string path, string worksheetName)
{
this.path = path;
this.worksheetName = worksheetName;
}
public IEnumerable<Recipient> GetData(bool isFirstRowAsColumnNames = true)
{
var excelData = new ExcelData(path);
var dataRows = excelData.GetData(worksheetName, isFirstRowAsColumnNames);
return dataRows.Select(dataRow => new Recipient()
{
Municipality = dataRow["Municipality"].ToString(),
Sexe = dataRow["Sexe"].ToString(),
LivingArea = dataRow["LivingArea"].ToString()
}).ToList();
}
}
Basically, I figured that I needed a class on top of the ExcelData
class, because it seemed somewhat higher level.
The ExcelData
class:
public class ExcelData
{
private string path;
public ExcelData(string path)
{
this.path = path;
}
private IExcelDataReader GetExcelDataReader(bool isFirstRowAsColumnNames)
{
using (FileStream fileStream = File.Open(path, FileMode.Open, FileAccess.Read))
{
IExcelDataReader dataReader;
if (path.EndsWith(".xls"))
{
dataReader = ExcelReaderFactory.CreateBinaryReader(fileStream);
}
else if (path.EndsWith(".xlsx"))
{
dataReader = ExcelReaderFactory.CreateOpenXmlReader(fileStream);
}
else
{
//Throw exception for things you cannot correct
throw new FileToBeProcessedIsNotInTheCorrectFormatException("The file to be processed is not an Excel file");
}
dataReader.IsFirstRowAsColumnNames = isFirstRowAsColumnNames;
return dataReader;
}
}
private DataSet GetExcelDataAsDataSet(bool isFirstRowAsColumnNames)
{
return GetExcelDataReader(isFirstRowAsColumnNames).AsDataSet();
}
private DataTable GetExcelWorkSheet(string workSheetName, bool isFirstRowAsColumnNames)
{
DataSet dataSet = GetExcelDataAsDataSet(isFirstRowAsColumnNames);
DataTable workSheet = dataSet.Tables[workSheetName];
if (workSheet == null)
{
throw new WorksheetDoesNotExistException(string.Format("The worksheet {0} does not exist, has an incorrect name, or does not have any data in the worksheet", workSheetName));
}
return workSheet;
}
public IEnumerable<DataRow> GetData(string workSheetName, bool isFirstRowAsColumnNames = true)
{
DataTable workSheet = GetExcelWorkSheet(workSheetName, isFirstRowAsColumnNames);
IEnumerable<DataRow> rows = from DataRow row in workSheet.Rows
select row;
return rows;
}
}
And finally, the Recipient
class (which does nothing special):
namespace ConsoleApplicationForTestingPurposes
{
public class Recipient
{
public string Municipality { get; set; }
public string Sexe { get; set; }
public string LivingArea { get; set; }
}
}
The exception classes inherit from Exception
, and just pass a message to Exception.
1 Answer 1
Simplify your catch-chain
static void Main(string[] args)
{
try
{
ReadInData readInData = new ReadInData(@"C:\SC.xlsx", "sc_2014");
IEnumerable<Recipient> recipients = readInData.GetData();
}
catch (Exception ex)
{
if(!(ex is FileNotFoundException || ex is ArgumentException || ex is FileToBeProcessedIsNotInTheCorrectFormatException))
throw;
Console.WriteLine(ex.Message);
}
Console.Write(Press any key to continue...);
Console.ReadKey(true);
}
I see no reason to have ReadInData
be a non-static class. You are not taking advantage of the fact that you are remembering the path and worksheet name, and you aren't keeping some sort of open connection to the workbook. And always feel free to make your code look more complex by removing variables that are only being used once (optional). There is no reason to ToList()
this, because you are returning an IEnumerable<T>
anyway.
public static class ReadInData
{
public static IEnumerable<Recipient> GetData(string path, string worksheetName, bool isFirstRowAsColumnNames = true)
{
return new ExcelData(path).GetData(worksheetName, isFirstRowAsColumnNames)
.Select(dataRow => new Recipient()
{
Municipality = dataRow["Municipality"].ToString(),
Sexe = dataRow["Sexe"].ToString(),
LivingArea = dataRow["LivingArea"].ToString()
});
}
}
You can do this same thing for your ExcelData
class, that is.. making these functions instead of methods on a class. You don't have to, but again, you aren't taking much advantage of the OO, so there is need for you to make it OO.
If you don't want/need someone using your ExcelData
class, than encapsulate it. However I feel that you will at some point want to be able to re-use this excel reader, in which case I would move these methods into a Static ExcelHelperClass
. And your first class ReadInData
I would just make a method in your original program.
private static IExcelDataReader GetExcelDataReader(string path, bool isFirstRowAsColumnNames)
{
using (FileStream fileStream = File.Open(path, FileMode.Open, FileAccess.Read))
{
IExcelDataReader dataReader;
if (path.EndsWith(".xls"))
dataReader = ExcelReaderFactory.CreateBinaryReader(fileStream);
else if (path.EndsWith(".xlsx"))
dataReader = ExcelReaderFactory.CreateOpenXmlReader(fileStream);
else
throw new FileToBeProcessedIsNotInTheCorrectFormatException("The file to be processed is not an Excel file");
dataReader.IsFirstRowAsColumnNames = isFirstRowAsColumnNames;
return dataReader;
}
}
private static DataSet GetExcelDataAsDataSet(string path, bool isFirstRowAsColumnNames)
{
return GetExcelDataReader(path, isFirstRowAsColumnNames).AsDataSet();
}
private static DataTable GetExcelWorkSheet(string path, string workSheetName, bool isFirstRowAsColumnNames)
{
DataTable workSheet = GetExcelDataAsDataSet(path, isFirstRowAsColumnNames).Tables[workSheetName];
if (workSheet == null)
throw new WorksheetDoesNotExistException(string.Format("The worksheet {0} does not exist, has an incorrect name, or does not have any data in the worksheet", workSheetName));
return workSheet;
}
private static IEnumerable<DataRow> GetData(string path, string workSheetName, bool isFirstRowAsColumnNames = true)
{
return from DataRow row in GetExcelWorkSheet(path, workSheetName, isFirstRowAsColumnNames).Rows select row;
}
As mentioned in the comments, you are not accounting for all Excel filetypes.
If it wasn't for the bool isFirstRowAsColumnNames
I would suggest you actually go forward with the whole OOP thing, and have the workbook load when you construct it, and take advantage of OO design by actually having internal data besides just the path. I do think its is fine that you give them the option to specify isFirstRowAsColumnNames
, that could be very handy.
-
\$\begingroup\$ First off, thank you very much for your time in putting together what could be improved about my code. I have a couple of questions though; the first one being that I'm not sure what you meant with "And your first class ReadInData I would just make a method in your original program." The second thing is, that I'm also not really sure what you meant with the first sentence of your last paragraph. Are you saying that if I wouldn't have the isFirstRowAsColumnNames parameter, I could keep my code more or less the ways it was? If so, why would it depend on that? \$\endgroup\$user3488442– user34884422014年04月16日 06:59:20 +00:00Commented Apr 16, 2014 at 6:59
".xlsm"
and".xlsb"
, as well as the template types... \$\endgroup\$