I got a string which I need to separate by another string which is a substring of the original one. Let's say I got the following text:
string s = "<DOC>something here <TEXT> and some stuff here </TEXT></DOC>"
And I want to retrieve:
"and some stuff here"
I need to get the string between the "<TEXT>" and his locker "</TEXT>".
I don't manage to do so with the common split method of string even though one of the function parameters is of type string[]. What I am trying is :
Console.Write(s.Split("<TEXT>")); // Which doesn't compile
Thanks in advance for your kind help.
5 Answers 5
var start = s.IndexOf("<TEXT>");
var end = s.IndexOf("</TEXT>", start+1);
string res;
if (start >= 0 && end > 0) {
res = s.Substring(start, end-start-1).Trim();
} else {
res = "NOT FOUND";
}
1 Comment
start.Splitting on "<TEXT>" isn't going to help you in this case anyway, since the close tag is "</TEXT>".
The most robust solution would be to parse it properly as XML. C# provides functionality for doing that. The second example at http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx should put you on the right track.
However, if you're just looking for a quick-and-dirty one-time solution your best bet is going to be to hand-code something, such as dasblinkenlight's solution above.
Comments
var output = new List<String>();
foreach (Match match in Regex.Matches(source, "<TEXT>(.*?)</TEXT>")) {
output.Add(match.Groups[1].Value);
}
1 Comment
string s = "<DOC>something here <TEXT> and some stuff here </TEXT></DOC>";
string result = Regex.Match(s, "(?<=<TEXT>).*?(?=</TEXT>)").Value;
EDIT: I am using this regex pattern (?<=prefix)find(?=suffix) which will match a position between a prefix and a suffix.
EDIT 2: Find several results:
MatchCollection matches = Regex.Matches(s, "(?<=<TEXT>).*?(?=</TEXT>)");
foreach (Match match in matches) {
Console.WriteLine(match.Value);
}
1 Comment
If last tag is </doc> then you could use XElement.Load to load XML and then go through it to discover wanted element (you could also use Linq To XML).
If this is not necessarily correct XML string, you could always go with Regural Expressions to find desired part of text. In this case expression should not be to hard to write it yourself.
</TEXT>or</DOC>?