3
\$\begingroup\$

I made the following regular expression for parsing ngnix log

log_1 = "1.169.137.128 - - [29/jun/2017:07:10:50 +0300] "GET /api/v2/banner/1717161 http/1.1" 200 2116 "-" "Slotovod" "-" "1498709450-2118016444-4709-10027411" "712e90144abee9" 0.199"

My test cases (https://regex101.com/r/Eyhxod/1)

lineformat = re.compile(r"""(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>\d{2}\/[a-z]{3}\/\d{4}:\d{2}:\d{2}:\d{2} (\+|\-)\d{4})\] \"GET (?P<url>.+?(?=\ http\/1.1")) http\/1.1" \d{3} \d+ "-" (?P<http_user_agent>.+?(?=\ )) "-" "(?P<x_forwaded_for>(.+?))" "(?P<http_xb_user>(.+?))" (?P<request_time>[+-]?([0-9]*[.])?[0-9]+)""",re.IGNORECASE)

Output:

data = re.search(lineformat, log_1)
data.groupdict()
{'ipaddress': '1.169.137.128',
 'dateandtime': '29/jun/2017:07:10:50 +0300',
 'url': '/api/v2/banner/1717161',
 'http_user_agent': '"Slotovod"',
 'x_forwaded_for': '1498709450-2118016444-4709-10027411',
 'http_xb_user': '712e90144abee9',
 'request_time': '0.199'}

I believe I should make it more robust towards edge cases and broken logs. Also I consider splitting my long expression into a smaller one. Any advices towards the best-practices are appreciated.

asked Mar 2, 2020 at 11:04
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Isn't there a commonly known Python module for parsing these log files? You are for sure not the first person in the world to try this. \$\endgroup\$ Commented Mar 5, 2020 at 6:29

1 Answer 1

2
\$\begingroup\$

At the very least, use verbose mode so you can see the whole thing at once. Remember to explicitly include whitespace.

lineformat = re.compile(r"""
 (?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+
 -\s+
 -\s+
 \[(?P<dateandtime>\d{2}\/[a-z]{3}\/\d{4}:\d{2}:\d{2}:\d{2} (\+|\-)\d{4})\]\s+
 \"GET (?P<url>.+?(?=\ http\/1.1")) http\/1.1"\s+
 \d{3}\s+
 \d+\s+
 "-"\s+
 (?P<http_user_agent>.+?(?=\ ))\s+
 "-"\s+
 "(?P<x_forwaded_for>(.+?))"\s+
 "(?P<http_xb_user>(.+?))"\s+
 (?P<request_time>[+-]?([0-9]*[.])?[0-9]+)
 """,
 re.IGNORECASE | re.VERBOSE)
answered Mar 5, 2020 at 6:07
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.