I'm using C# .NET Core 2.1 and Excel lib DotnetCore.NPOI
I'm converting xls
file to xlsx
by loading xls
and copying sheet by sheet, cell by cell to xlsx
- How can I improve performance / ram usage here?
private string ConvertToXSLX(string path)
{
using (var fs = File.OpenRead(path))
{
var result = XLS_to_XLSX_Converter.Convert(fs);
// (file.xls) + x = file.xlsx
path = $"{path}x";
using (var fs2 = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
{
fs2.Write(result, 0, result.Length);
}
}
return path;
}
public static byte[] Convert(Stream sourceStream)
{
var source = new HSSFWorkbook(sourceStream);
var destination = new XSSFWorkbook();
for (int i = 0; i < source.NumberOfSheets; i++)
{
var hssfSheet = (HSSFSheet)source.GetSheetAt(i);
var xssfSheet = (XSSFSheet)destination.CreateSheet(source.GetSheetAt(i).SheetName);
CopyRows(hssfSheet, xssfSheet);
}
using (var ms = new MemoryStream())
{
destination.Write(ms);
return ms.ToArray();
}
}
private static void CopyRows(HSSFSheet hssfSheet, XSSFSheet destination)
{
for (int i = 0; i < hssfSheet.PhysicalNumberOfRows; i++)
{
destination.CreateRow(i);
var cells = hssfSheet.GetRow(i)?.Cells ?? new List<ICell>();
for (int j = 0; j < cells.Count; j++)
{
var row = destination.GetRow(i);
row.CreateCell(j);
CopyCell((HSSFCell)cells[j], (XSSFCell)row.Cells[j]);
}
}
}
private static void CopyCell(HSSFCell oldCell, XSSFCell newCell)
{
CopyCellValue(oldCell, newCell);
}
private static void CopyCellValue(HSSFCell oldCell, XSSFCell newCell)
{
switch (oldCell.CellType)
{
case CellType.String:
newCell.SetCellValue(oldCell.StringCellValue);
break;
case CellType.Numeric:
newCell.SetCellValue(oldCell.NumericCellValue);
break;
case CellType.Blank:
newCell.SetCellType(CellType.Blank);
break;
case CellType.Boolean:
newCell.SetCellValue(oldCell.BooleanCellValue);
break;
case CellType.Error:
newCell.SetCellErrorValue(oldCell.ErrorCellValue);
break;
default:
break;
}
}
1 Answer 1
You're writing to a MemoryStream
only to get a byte[]
to write to another stream, you can definitely avoid this intermediate conversion:
private string ConvertToXSLX(string inputPath)
{
string outputPath = Path.ChangeExtension(inputPath, ".xlsx");
using (var input = File.OpenRead(inputPath))
using (var output = new FileStream(outputPath, FileMode.OpenOrCreate, FileAccess.Write))
{
XLS_to_XLSX_Converter.Convert(input, output);
}
return outputPath;
}
The change in Convert()
should be trivial.
Also note that I'm using Path.ChangeExtension()
instead of manually adding a character, code is clear without any comment and it handles special cases for you (for example a trailing space).
In CopyRows()
you create an empty List<ICell>
, you do not need to and you can avoid the initial allocation simply using Enumerable.Empty<ICell>()
(and changing your code to work with an enumeration) or - at least - reusing the same empty object (move it outside the loop and create it with an initial capacity of 0).
In CopyCellValue()
you have an empty default
case for your switch
. It's a good thing to always have default
but it has a purpose: detect unknown cases. If you put a break
then you're silently ignoring them. Is it on purpose? Write a comment to explain you ignore unknown cells. It's an error? Throw an exception.
I never used DotnetCore.NPOI then I can't comment about the way you use it, be sure to dispose all the intermediate IDisposable
objects.