4
\$\begingroup\$

I need to classify each line as "announce, whisper or chat" once I have that sorted out I need to extract certain values to be processed.

Right now my regex is as follow:

var regex = new Regex(@"^\[(\d{2}:\d{2}:\d{2})\]\s*(?:(\[System Message\])?\s*<([^>]*)>|((.+) Whisper You :))\s*(.*)$");
  • Group 0 is the entire message.
  • Group 1 is the hour time of when the message was sent.
  • Group 2 is wether it was an announce or chat.
  • Group 3 is who sent the announce.
  • Group 4 is if it was a whisper or not.
  • Group 5 is who sent the whisper.
  • Group 6 is the sent message by the user or system.

Classify each line:

if 4 matches
 means it is a whisper
 else if 2 matches
 means it is an announce
 else
 normal chat

Should I change anything to my regex to make it more precise/accurate on the matches ?

Sample data:

[02:33:03] John Whisper You : Heya
[02:33:03] John Whisper You : How is it going
[02:33:12] <John> [02:33:16] [System Message] bla bla
[02:33:39] <John> heya
[02:33:40] <John> hello :S
[02:33:57] <John> hi
[02:33:57] [System Message] <John> has left the room 
[02:33:57] [System Message] <John> has entered the room 
rolfl
98.1k17 gold badges219 silver badges419 bronze badges
asked May 31, 2011 at 22:27
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

You can always break it down in multiple lines to make it more readable. You can also use named groups which take the "magic" out of the group numbers (4 == whisper, 3 == normal, etc).

 var regex = new Regex(@"^\[(?<TimeStamp>\d{2}:\d{2}:\d{2})\]\s*" +
 @"(?:" +
 @"(?<SysMessage>\[System Message\])?\s*" +
 @"<(?<NormalWho>[^>]*)>|" +
 @"(?<Whisper>(?<WhisperWho>.+) Whisper You :))\s*" +
 @"(?<Message>.*)$");
 string data = @"[02:33:03] John Whisper You : Heya
[02:33:03] John Whisper You : How is it going
[02:33:12] <John> [02:33:16] [System Message] bla bla
[02:33:39] <John> heya
[02:33:40] <John> hello :S
[02:33:57] <John> hi
[02:33:57] [System Message] <John> has left the room 
[02:33:57] [System Message] <John> has entered the room";
 foreach (var msg in data.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
 {
 Match match = regex.Match(msg);
 if (match.Success)
 {
 if (match.Groups["Whisper"].Success)
 {
 Console.WriteLine("[whis from {0}]: {1}", match.Groups["WhisperWho"].Value, msg);
 }
 else if (match.Groups["SysMessage"].Success)
 {
 Console.WriteLine("[sys msg]: {0}", msg);
 }
 else
 {
 Console.WriteLine("[normal from {0}]: {1}", match.Groups["NormalWho"].Value, msg);
 }
 }
 }
answered May 31, 2011 at 23:47
\$\endgroup\$
1
  • \$\begingroup\$ that is pretty cool dude thanks I was looking for a way to actually split each pattern I wanted like that but was not aware of how to do it. \$\endgroup\$ Commented May 31, 2011 at 23:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.