I am extracting data from .txt files and writing needed data to Excel. My problem is that my code is very very slow and at one point even slower (+35k txt files). How should I rethink this problem?
Microsoft.Office.Interop.Excel.Application excelapp = new Microsoft.Office.Interop.Excel.Application();
excelapp.Visible = true;
_Workbook workbook = (_Workbook)(excelapp.Workbooks.Open(textBox2.Text));
_Worksheet worksheet = (_Worksheet)workbook.ActiveSheet;
DateTime dt = DateTime.Now;
foreach (string fileName in Directory.GetFiles(textBox1.Text, "*.txt"))
{
int row = 1, EmptyRow = 0;
while (Convert.ToString(worksheet.Range["A" + row].Value) != null)
{
row++;
EmptyRow = row;
}
string lines1 = GetLine(fileName, 10);
File.WriteAllText(@"D:/text.txt", lines1);
string[] ParsedLines = File.ReadAllLines(@"D:/text.txt");
string comp = ParsedLines[0];
string compare = comp.Substring(30, comp.Length - 30);
if (compare == "Failed")
{
string[] lines = File.ReadAllLines(fileName);
string serial = lines[3];
string SN = serial.Substring(30, serial.Length - 30);
worksheet.Range["A" + EmptyRow].Value2 = SN;
string data = lines[4];
string DT = data.Substring(30, data.Length - 30);
worksheet.Range["B" + EmptyRow].Value2 = DT;
string time = lines[5];
string TM = time.Substring(30, time.Length - 30);
worksheet.Range["C" + EmptyRow].Value2 = TM;
string operat = lines[6];
string OP = operat.Substring(30, operat.Length - 30);
worksheet.Range["D" + EmptyRow].Value2 = OP;
string result = lines[9];
string PF = result.Substring(30, result.Length - 30);
worksheet.Range["E" + EmptyRow].Value2 = PF;
foreach (string line in lines)
{
if (line.Contains("FixtureCoverResistance:"))
{
string FCResistance = line;
string FCR = FCResistance.Substring(31, FCResistance.Length - 31);
worksheet.Range["F" + EmptyRow].Value2 = FCR;
break;
}
}
foreach (string line in lines)
{
if (line.Contains("FwProgrammingCheck:"))
{
string FwProgrammingCheck = line;
string FwP = FwProgrammingCheck.Substring(31, FwProgrammingCheck.Length - 31);
worksheet.Range["G" + EmptyRow].Value2 = FwP;
break;
}
}
foreach (string line in lines)
{
if (line.Contains("Checksum ="))
{
string Checksum = line;
string da = Checksum.Substring(11, Checksum.Length - 11);
worksheet.Range["H" + EmptyRow].Value2 = da;
}
}
foreach (string line in lines)
{
if (line.Contains("FwEepromCheck:"))
{
string FwEepromCheck = line;
string FwE = FwEepromCheck.Substring(31, FwEepromCheck.Length - 31);
worksheet.Range["I" + EmptyRow].Value2 = FwE;
break;
}
}
foreach (string line in lines)
{
if (line.Contains("FixtureCoverResistanceAfterProg:"))
{
string FixtureCoverResistanceAfterProg = line;
string FCAP = FixtureCoverResistanceAfterProg.Substring(33, FixtureCoverResistanceAfterProg.Length - 33);
worksheet.Range["K" + EmptyRow].Value2 = FCAP;
break;
}
}
}
else
{
}
}
1 Answer 1
Input Validation
Currently the code just takes user-input and assumes it's well-formed and valid. That is not a reasonable assumption for all accounts and purposes.
The first place this happens is when the workbook is retrieved. The msdn-Documentation states, that the "file name" is required to open the Workbook. What if the user gives you something obviously incorrect like "my$\important\$Wb.exe"?
It would be a much cleaner UX, if you used a FilePicker here to get the workbook's filename.
The next place is where the code assumes textBox1.Text
contains a valid Path. Here the "correct" solution is to show a FolderPicker
Stuff ...
The casing on the variable names is inconsistent. You should be consistent at almost any cost. It's important to make sure future maintainers (including you) can read the code easily and know what to expect.
In that vein variable names like SN
and DT
don't even remotely help, because nobody beside you right now can tell what they mean. You'll have a very hard time reading the code in 2 or 3 months, because you will have forgotten most of what the names mean. serialNumber
instead of SN
and timestamp
instead of TM
as well as better naemes for all these variables will help you when you need to touch this code again.
On that note there's quite a bunch of unnecessary assignments in the code. You're iterating foreach (string line in lines)
multiple times. For one you could merge all of these loops, because they're completely independent.
Additionally whenever there's a match for what you need, the first thing you do is copying the line into an intermediary variable (which is busywork for the CPU, if it's not optimized out). Let me show what I mean in an example:
foreach (string line in lines)
{
if (line.Contains("FixtureCoverResistance:"))
{
(worksheet.Cells[emptyRow, 6] as Excel.Range).Value2 = line.Substring(31, line.length - 31);
}
if (line.Contains("FwProgrammingCheck:"))
{
(worksheet.Cells[emptyRow, 7] as Excel.Range).Value2 = line.Substring(31, line.length - 31);
}
// ...
}
If you want to stop iterating lines when all of these properties have been used (which seems like a micro-optimization to me), you'll need to change your foreach loop to a for-loop over an index and stop iterating depending on flags.
You can see here that I use the Cells
property to access the relevant cells in the worksheet instead of Range
. This should be minimally faster, because accessing a Range over a string is always bound to make Excel parse the specifier string into something more suitable for addressing the range.
Additionally it integrates nicely with the use of EmptyRow
On a last note: Since the else
-block is empty, you may want to consider inverting the if-condition to save one level of indentation and switch away early:
if (compare != "Failed")
{
continue;
}
string[] lines = File.ReadAllLines(fileName);
// ...
-
\$\begingroup\$ Thank you a lot for the healthy tips and way of thinking problems.I will try to change my way of thinking and listen to what you said.I now modified the program and it seems faster.If you have anymore tips on how to make it faster I would gladly listen :) \$\endgroup\$John Pietrar– John Pietrar2016年11月28日 09:38:35 +00:00Commented Nov 28, 2016 at 9:38
foreach
iterating over the same content. Have you considered iterating only once? \$\endgroup\$