I have a working C# (version 5) function that I use to match an input string to one of many unique regular expression patterns and return the replacement string associated with the matched pattern (via Regex.Replace
). I've tested it well enough to know that the code acts as intended and is reliable.
One benefit of this approach is that it is readable (to me) and easy to edit any of the constant string variables. However, it is what I would consider to be "the long way."
Am I missing out on a more elegant technique that doesn't require the newing up of the seven Regex variables and a long if-else if-else block? If I need to add more patterns, then I'd be newing up more variables and adding to the if-else if-else block. I have not yet seen any better techniques in Pluralsight or in Stack Overflow (those were more concerned with fixing a specific expression and bug hunting).
The code block you see below was created in Visual Studio 2013 Ultimate as a SQL Database Project and published to a SQL Server Database (version 2014), so that the User Defined Function can be used in T-SQL Select queries. All this because T-SQL does not have true regular expression functionality as exists in C# NET.
Most of the space of this function is taken up by variable definition.
- Two strings (
pattern
andreplacement
) -- at first empty, to be assigned at the end of the function. - Seven Regular Expression patterns (the
search
input must match to one-and-only-one pattern). - Seven replacement strings, where each
replacement
string is paired to apattern
. - Seven Regex variables
The behavior takes place in a long if-else if-else block. If the search
matches any of the patterns, then the pattern
and replacement
variables are assigned.
Finally, the replacement string is returned (via the Regex.Replace(search, pattern, replacement)
function).
Are there any better approaches?
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
namespace CustomClrFunctions
{
/// <summary>
/// This set of CLR functions is published to the CustomClrFunctions Database to apply
/// true Regex match and replacement functionality, as T-SQL does not (yet) provide
/// that feature.
/// </summary>
public partial class UserDefinedFunctions
{
/// <summary>
/// This function replaces the OW format into ADE format for ELA standards.
/// </summary>
/// <param name="search">An ELA standard in OW format. Example: "LA.11-12.11-12.L.1.a"</param>
/// <returns>The same ELA standard translated to ADE format.</returns>
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString RegexReplaceElaHs(SqlChars search)
{
/* Known patterns and replacements (to replace the search term) */
const string pattern1 = @"LA\.11-12\.11-12\.(\w{1,4}).(\d{1,2})(\.\w)?";
const string replacement1 = "LA.11-12.1ドル.2ドル3ドル";
const string pattern2 = @"LA\.9-10\.9-10\.(\w{1,4}).(\d{1,2})(\.\w)?";
const string replacement2 = "LA.9-10.1ドル.2ドル3ドル";
const string pattern3 = @"LA\.K-12\.CCSS\.ELA-Literacy\.CCRA\.(\w{1,2})\.(\d{1,2})";
const string replacement3 = "CCRA.1ドル.2ドル";
const string pattern4 = @"LA\.11-12\.(\d{1,2})\.(\w{1,2})\.(\d{1,2})";
const string replacement4 = "LA.1ドル.2ドル.3ドル";
const string pattern5 = @"LA\.9-10\.(\d{1,2})\.(\w{1,2})\.(\d{1,2})";
const string replacement5= "LA.1ドル.2ドル.3ドル";
const string pattern6 = @"LA\.6-8\.6-8\.(\w{1,4}).(\d{1,2})(\.\w)?";
const string replacement6 = "LA.6-8.1ドル.2ドル3ドル";
const string pattern7 = @"LA\.8\.8\.(\w{1,2}).(\d)(\.\w)?";
const string replacement7 = "LA.8.1ドル.2ドル3ドル";
var regex1 = new Regex(pattern1);
var regex2 = new Regex(pattern2);
var regex3 = new Regex(pattern3);
var regex4 = new Regex(pattern4);
var regex5 = new Regex(pattern5);
var regex6 = new Regex(pattern6);
var regex7 = new Regex(pattern7);
string pattern;
string replacement;
/* The following if-else block assigns
* values to "pattern" and "replacement"
* depending on which pattern matches "search"
*/
if (regex1.IsMatch(new string(search.Value)))
{
pattern = pattern1;
replacement = replacement1;
}
else if (regex2.IsMatch(new string(search.Value)))
{
pattern = pattern2;
replacement = replacement2;
}
else if (regex3.IsMatch(new string(search.Value)))
{
pattern = pattern3;
replacement = replacement3;
}
else if (regex4.IsMatch(new string(search.Value)))
{
pattern = pattern4;
replacement = replacement4;
}
else if (regex5.IsMatch(new string(search.Value)))
{
pattern = pattern5;
replacement = replacement5;
}
else if (regex6.IsMatch(new string(search.Value)))
{
pattern = pattern6;
replacement = replacement6;
}
else if (regex7.IsMatch(new string(search.Value)))
{
pattern = pattern7;
replacement = replacement7;
}
else
{
pattern = string.Empty;
replacement = string.Empty;
}
// This returns the transformation of the "search" value in ADE format.
// replacement is a string replacement.
return Regex.Replace(new string(search.Value), pattern, replacement);
}
}
}
2 Answers 2
The biggest and most obvious improvement you're missing out on is that regular expressions in .NET can be compiled (which can give a huge performance boost), and you can use readonly
fields to make sure you don't recompile the Regex
every time you call the method.
private static readonly Regex _pattern1 = new Regex(@"LA\.11-12\.11-12\.(\w{1,4}).(\d{1,2})(\.\w)?", RegexOptions.Compiled);
This could literally save tens of milliseconds per pattern, which means in your case it could save a lot of time. You should then create a local string for search.Value
, to save more time and keep things readable, as well as some other miscellaneous tweaks. In the end you end up with:
private static readonly Regex _pattern1 = new Regex(@"LA\.11-12\.11-12\.(\w{1,4}).(\d{1,2})(\.\w)?", RegexOptions.Compiled);
private static readonly string _replacement1 = "LA.11-12.1ドル.2ドル3ドル";
// Remaining patterns
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString RegexReplaceElaHs(SqlChars search)
{
var value = new string(search.Value);
if (_pattern1.IsMatch(value))
{
return _pattern1.Replace(value, _replacement1);
}
if (_pattern2.IsMatch(value))
{
return _pattern2.Replace(value, _replacement2);
}
// Remaining matches
// Final return is just the default value, which is what yours does anyway
return value;
}
Then, you could make a private static readonly Dictionary<Regex, replacement> _replacements = ...
and loop it:
foreach (var entry in _replacements)
{
if (entry.Key.IsMatch(value))
{
return entry.Replace(value, entry.Value);
}
}
And ta-da, you eliminated many LoC and got a lot of processing time back.
I'll post up a much more complete solution later, but you should be able to make this work as expected. :)
-
\$\begingroup\$ a Dictionary with a Regex as a Key ;-] if this is not dictionary abuse that I don't know what is :-P \$\endgroup\$t3chb0t– t3chb0t2017年06月23日 20:07:33 +00:00Commented Jun 23, 2017 at 20:07
-
\$\begingroup\$ @t3chb0t That's fair, though with C#7.0 I would use a
Tuple<Regex, string>
, in OP's case I prefer the syntax of a dictionary. :) \$\endgroup\$Der Kommissar– Der Kommissar2017年06月23日 21:01:18 +00:00Commented Jun 23, 2017 at 21:01 -
\$\begingroup\$ It's the creation of the Dictionary and the for-each statement in the latter half of your answer that worked well for me. It certainly looks more compact. I look forward to seeing the rest of your solution. \$\endgroup\$RandomHandle– RandomHandle2017年06月23日 21:01:24 +00:00Commented Jun 23, 2017 at 21:01
-
\$\begingroup\$ @RandomHandle A
Dictionary<Regex, string>
may not be the greatest option, but for C#5/6 it's the one I'd choose. The jabbing by t3chb0t isn't just for fun - he does bring up a valid point. ;) \$\endgroup\$Der Kommissar– Der Kommissar2017年06月23日 21:05:16 +00:00Commented Jun 23, 2017 at 21:05 -
\$\begingroup\$ @EBrown -- Is there a purpose of defining the Regex variables on a class level rather than on a function level? \$\endgroup\$RandomHandle– RandomHandle2017年06月23日 21:05:25 +00:00Commented Jun 23, 2017 at 21:05
The solution with a Dictionary
is definitely an improvement becasue it reduces a lot of repetitions and gathers all regexes and replacements together but I'd like to point to one flaw it might have. I don't know if it applies to your use case here but you should keep this in mind in case you should want to use this technique for something else.
A Dictionary<>
does not maintain the order of its elements (See: Why is a Dictionary "not ordered"?
. This means if you had a long list and wanted to have the most probable cases first there is no guaratee they will stay in the same order as you added them.
For the reason detailed above I find it's better to create a simple object like (I used read/write properties but an immutable object would be more appropriate)
class Translation
{
public Regex Matcher { get; set; }
public string Replacement { get; set; }
public bool CanTranslate(string value)
{
return Matcher.IsMatch(value);
}
public string Translate(string value)
{
return Matcher.Replace(value, Replacement);
}
}
and put them in a List<Translation>
so that they are always processed in the original order.
var translation = translations.FirstOrDefault(t => t.CanTranslate(value));
if (translation != null)
{
return translation.Translate(value);
}
Additionaly you can already implement the translation in this object and thus simplify the code a little bit more by completely encapsulating the fact that you actually work with Regex
. If you would want to use another technique for translating, the Translation
would be the only thing you'd have to change.
You could for example also create an interface for it if you wanted to have translations that work with either Regex or other methods.
However the most comfortable solution would be to implement one more method like this one (either by the class itself or an extension):
bool TryTranslate(string value, out string translated)
{
if(CanTranslate(value))
{
translated = Translate(value);
return true;
}
translated = null;
return false;
}
and use it with a loop like this:
foreach(var translation in translations)
{
string translated;
if(translation.TryTranslate(value, out translated))
{
return translated;
}
}