I have made a funtion to compare text files line by line, where timestamps and jobnumbers are replaced for a identical string. By lack of experience I'm not sure if this is the best way to do this.
if (row < comparedFile.Count)
{
var regexJobNumber = @"([0-9]{4})";
var regexTimeStamp = @"([0-9]{8}_[0-9]{6})";
var input = line;
var removeTimestamp = Regex.Matches(input, regexTimeStamp).Cast<object>().Aggregate(input, (current, Match) => Regex.Replace(current, regexTimeStamp, "<TimeStamp>"));
var removeJobnumber = Regex.Matches(removeTimestamp, regexJobNumber).Cast<object>().Aggregate(removeTimestamp, (current, Match) => Regex.Replace(current, regexJobNumber, "<JobNummer>"));
var input2 = comparedFile[row];
var removeTimestamp2 = Regex.Matches(input2, regexTimeStamp).Cast<object>().Aggregate(input2, (current, Match) => Regex.Replace(current, regexTimeStamp, "<TimeStamp>"));
var removeJobnumber2 = Regex.Matches(removeTimestamp2, regexJobNumber).Cast<object>().Aggregate(removeTimestamp2, (current, Match) => Regex.Replace(current, regexJobNumber, "<JobNummer>"));
if (!removeJobnumber.Equals(removeJobnumber2))
{
var rowcount = row + 1;
differenceList.Add("Difference found on rownumber: " + rowcount);
}
row++;
}
1 Answer 1
This isn't complete code. We have no idea what's happening with row
; it seems to be a loop index, but we have no idea what else is happening in the loop.
Anyway, here's my take on things:
- Create compiled regexes and move their creation out of the loop, creating
Regex
instance variables instead of using the static methods. This gets you a speed boost and simplifies the code. - Do the
.Replace()
s unconditionally; there's no harm in doing so. - Extract the common code into a new function/method. Don't repeat yourself. If you decided you wanted to use the English spelling "Number" instead of Dutch "Nummer" you don't want to have to fix it in two places.
So here it is:
string Standardize(string input)
{
static var regexJobNumber = new Regex(@"(\d{4})", RegexOptions.Compiled);
static var regexTimeStamp = new Regex(@"(\d{8}_\d{6})", RegexOptions.Compiled);
string intermediate = regexTimeStamp.Replace(input, "<TimeStamp>");
return regexJobNumber.Replace(intermediate,"<JobNumber>");
}
//...
if (row < comparedFile.Count)
{
if (Standardize(removeJobnumber) != Standardize(removeJobnumber2))
{
var rowcount = row + 1;
differenceList.Add("Difference found on rownumber: " + rowcount);
}
row++;
}
My suggestion is to go through the documentation for Regex
and see how each method is used.
input
andinput2
guaranteed to have exactly one timestamp and exactly one jobnumber? Are they guaranteed to be in any particular order? \$\endgroup\$