I am trying to implement a sub-string extractor with "start keyword" and "end keyword" and the extracted result is from (but excluded) the given start keyword to (but excluded) end keyword. For example:
Input String | Start Keyword | End Keyword | Output |
---|---|---|---|
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | ""(empty string) | ""(empty string) | "C# was developed around 2000 by Microsoft as part of its .NET initiative" |
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | ""(empty string) | ".NET" | "C# was developed around 2000 by Microsoft as part of its" |
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | "C#" | ""(empty string) | "was developed around 2000 by Microsoft as part of its .NET initiative" |
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | "C#" | ".NET" | "was developed around 2000 by Microsoft as part of its" |
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | ".NET" | ""(empty string) | "initiative" |
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | ""(empty string) | "C#" | ""(empty string) |
"C# was developed around 2000 by Microsoft as part of its .NET initiative" | ".NET" | "C#" | ""(empty string) |
The experimental implementation
The experimental implementation is as below.
private static string GetTargetString(string stringInput, string startKeywordInput, string endKeywordInput)
{
int startIndex;
if (String.IsNullOrEmpty(startKeywordInput))
{
startIndex = 0;
}
else
{
if (stringInput.IndexOf(startKeywordInput) >= 0)
{
startIndex = stringInput.IndexOf(startKeywordInput) + startKeywordInput.Length;
}
else
{
return "";
}
}
int endIndex;
if (String.IsNullOrEmpty(endKeywordInput))
{
endIndex = stringInput.Length;
}
else
{
if (stringInput.IndexOf(endKeywordInput) > startIndex)
{
endIndex = stringInput.IndexOf(endKeywordInput);
}
else
{
return "";
}
}
// Check startIndex and endIndex
if (startIndex < 0 || endIndex < 0 || startIndex >= endIndex)
{
return "";
}
if (endIndex.Equals(0).Equals(true))
{
endIndex = stringInput.Length;
}
int TargetStringLength = endIndex - startIndex;
return stringInput.Substring(startIndex, TargetStringLength).Trim();
}
Test cases
string test_string1 = "C# was developed around 2000 by Microsoft as part of its .NET initiative";
Console.WriteLine(GetTargetString(test_string1, "", ""));
Console.WriteLine(GetTargetString(test_string1, "", ".NET"));
Console.WriteLine(GetTargetString(test_string1, "C#", ""));
Console.WriteLine(GetTargetString(test_string1, "C#", ".NET"));
Console.WriteLine(GetTargetString(test_string1, ".NET", ""));
Console.WriteLine(GetTargetString(test_string1, "", "C#"));
Console.WriteLine(GetTargetString(test_string1, ".NET", "C#"));
The output of the above test cases.
C# was developed around 2000 by Microsoft as part of its .NET initiative
C# was developed around 2000 by Microsoft as part of its
was developed around 2000 by Microsoft as part of its .NET initiative
was developed around 2000 by Microsoft as part of its
initiative
If there is any possible improvement, please let me know.
2 Answers 2
Maybe the last if
-statement could be simplified by removing the Equals(true)
, for Equals(0)
already returns a bool, doesn’t it?
Edit:
Actually, I think you could skip the whole if block because if endIndex
is 0 it couldn’t bypass the if
-statement before, could it?
If startIndex
is 0 empty string will be returned startIndex >= endIndex
.
If startIndex
is less than 0 then empty string will be returned.
So how could endIndex be 0 at the last if
-statement?
I tend to have the "error handling" code at the beginning of the method, which usually makes the rest of the method more simple.
private static string GetTargetString(string input, string startKeyword, string endKeyword, StringComparison comparer)
{
if (!string.IsNullOrEmpty(startKeyword) && input.IndexOf(startKeyword, comparer) < 0) return "";
if (!string.IsNullOrEmpty(endKeyword) && input.IndexOf(endKeyword, comparer) < 0) return "";
var startIndex = string.IsNullOrEmpty(startKeyword)
? 0
: input.IndexOf(startKeyword, comparer) + startKeyword.Length;
var endIndex = string.IsNullOrEmpty(endKeyword)
? input.Length
: input.IndexOf(endKeyword, comparer);
if (startIndex < 0 || endIndex < 0 || startIndex >= endIndex) return "";
return input.Substring(startIndex, endIndex - startIndex).Trim();
}
string.IndexOf
\$\endgroup\$