I have the following code which attempts to match all strings like "*SOMESTRING" (which can include numeric values), but not "*SOMESTRING*". For this I am using a negative lookahead as follows;*SEX
and *AN01ZORA
should match, \*PCCL\*
should not match.
string s = " if 'L,....' MDC = '13' Then " +
" if 'B,960.' SEX NOT = '2' AND *SEX NOT = '3' Then " +
" DRG = 960Z (UNGROUPABLE) " +
" GoTo MDC FldErr " +
"Else if 'B,N01.' SRG IN TABLE(*AN01ZORA) Then " +
" if '.,N01.' *PCCL* > 2 Then ";
Regex rr = new Regex(@"(?i)(?!\*\w+\*)\*\w+");
MatchCollection mc = rr.Matches(s);
foreach (Match m in mc)
m.ToString().Dump();
Output: *SEX *AN01ZORA
This seems to produce the correct output, but feels nasty and not correct. Is this right and what could I do to make the Regex
better?
1 Answer 1
Your regex is overly complicated, I must admit. The negative lookahead is going to do a lot of work to identify all the negative cases before even looking for (nearly) positive matches.
I think the trick you are missing is the word-boundary anchor. Consider the following regex:
\*\w+\b
This looks for an asterisk, followed by characters, and then a (zero length) word-boundary. Now, both *SOME*
and *SOME
match that, since the \b
happens before the asterisk. The negative lookahead would be useful after the word-boundary. Consider the following:
\*\w+\b(?!\*)
Look for *SOME
where the SOME
is a complete word not followed by an asterisk.
Here's a little demonstration ....
Edit: Note, there is no reason to add the case-insensitive switch ((?i)
) because your regular expression has no specific case-based characters.