I am trying to parse out IP addresses from Outlook email headers. I've started writing some stuff in C# (because that is the example I was leveraging) and have come up with something close.
I can split the headers with the string lines[] = Regex.Split(headers, @"\r\n"); command okay, but when I try to iterate through the lines[] array, my regex for IP address fails and does not store the value in a second array:
Code:
private void button1_Click(object sender, EventArgs e)
{
// use a string constant to define the mapi property
string PidTagTransportMessageHeaders = @"http://schemas.microsoft.com/mapi/proptag/0x007D001E";
string mypattern = @"(#{1,3}\.)(#{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3})";
// string[] ip = Regex.Split(lines[i], (@"(\(|\[)(#{1,3}\.)(#{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3})(\)|\])"));
// get a handle on the current message
Outlook.MailItem message = (Outlook.MailItem)this.OutlookItem;
// use the property accessor to retreive the header
string headers = string.Empty;
try
{
headers = (string)message.PropertyAccessor.GetProperty(PidTagTransportMessageHeaders);
}
catch {
}
// if getting the internet headers is successful, put into textbox
string[] lines = Regex.Split(headers, "\r\n");
Regex regexObj = new Regex(mypattern);
for (int i = 0; i < lines.Length; i++)
{
MatchCollection matches = regexObj.Matches(lines[i]);
}
//eventually write the found IP array into textBox1.Text
textBox1.Text = headers;
}
}
}
Any help or suggestions?
3 Answers 3
Change your #'s to \d's:
string mypattern = @"(\d{1,3}\.)(\d{1,3}\.)(\d{1,3}\.)(\d{1,3})";
Note that a more accurate IPv4 address capture regular expression would be something like:
\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b
...or at least add word boundaries...
\b(\d{1,3}\.)(\d{1,3}\.)(\d{1,3}\.)(\d{1,3})\b
For a simple IPv6 (standard) I like:
(?<![:.\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\w])
6 Comments
\d and [0-9]? If not, you really should be consistent.\d matches any digit; [0-9] is just a subset.[0-9] throughout seems like a better idea./dIPAddress.Parse Method do not reinvent the wheel.
3 Comments
If you're trying to match off IPv4, then try this beast, should be fairly close to what an actual IPv4 can be, the enclosing \b mean beginning and end of a word, so you should be able to remove those and tweak to your heart's content to get the ip based on your header format
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
headersvariable contains?