I wrote a program that reads a pcap file and parses the HTTP traffic in the pcap to generate a dictionary that contains HTTP headers for each request and response in this pcap.
My code does the following:
- Uses tcpflow to reassemble the tcp segments
- Read the files generated by tcpflow and check if it related to HTTP
- If the file contains HTTP traffic, my code will read the file and generate a corresponding dictionary that contains the HTTP header fields.
I test my code with multiple test cases, but honestly I don't have a good experience in Python, so could anyone check it for me please?
import os
from os import listdir
from os.path import isfile, join
from StringIO import StringIO
import mimetools
def getFields(headers):
fields={}
i=1
for header in headers:
if len(header)==0:
continue
# if this line is complement for the previous line
if header.find(" ")==0 or
header.find("\t")==0:
continue
if len(header.split(":"))>=2:
key = header.split(":")[0].strip()
# if the key has multiple values such as cookie
if fields.has_key(key):
fields[key]=fields[key]+" "+header[header.find(":")+1:].strip()
else:
fields[key]=header[header.find(":")+1:].strip()
while headers[i].find(" ")==0 or
headers[i].find("\t")==0 :
fields[key]=fields[key]+" "+headers[i].strip()
i=i+1
# end of the while loop
# end of the else
else:
# else for [if len(header.split(":"))>=2: ]
print "ERROR: RFC VIOLATION"
# end of the for loop
return fields
def main():
# you have to write it in the terminal "cd /home/user/Desktop/empty-dir"
os.system("tcpflow -r /home/user/Desktop/12.pcap -v")
for f in listdir("/home/user/Desktop/empty-dir"):
if f.find("80")==19 or f.find("80")==41:
with open("/home/user/Desktop/empty-dir"+f) as fh:
fields={}
content=fh.read() #to test you could replace it with content="any custom http header"
if content.find("\r\n\r\n")==-1:
print "ERROR: RFC VIOLATION"
return
headerSection=content.split("\r\n\r\n")[0]
headerLines=headerSection.split("\r\n")
firstLine=headerLines[0]
firstLineFields=firstLine.split(" ")
if len(headerLines)>1:
fields=getFields(headerLines[1:])
if len(firstLineFields)>=3:
if firstLine.find("HTTP")==0:
fields["Version"]=firstLineFields[0]
fields["Status-code"]=firstLineFields[1]
fields["Status-desc"]=" ".join(firstLineFields[2:])
else:
fields["Method"]=firstLineFields[0]
fields["URL"]=firstLineFields[1]
fields["Version"]=firstLineFields[2]
else:
print "ERROR: RFC VIOLATION"
continue
print fields
print "__________________"
return 0
if __name__ == '__main__':
main()
2 Answers 2
New Lines and indentations help the interpreter know where the code terminates and blocks end, you have to be super careful with them
Like in your if condition, you can't have a newline in between the conditions.
if header.find(" ")==0 or
header.find("\t")==0:
continue
This code will error out because you can't have a new line in your condition statement.
Python is New Line Terminated. It should read like this
if header.find(" ")==0 or header.find("\t")==0
continue
Same with this piece of code
while headers[i].find(" ")==0 or
headers[i].find("\t")==0 :
fields[key]=fields[key]+" "+headers[i].strip()
i=i+1
It should read:
while headers[i].find(" ")==0 or headers[i].find("\t")==0 :
fields[key]=fields[key]+" "+headers[i].strip()
i=i+1
-
\$\begingroup\$ when I wrote my code in gedit the indentation was correct,but when I copy/past the code here the indentation is changed. \$\endgroup\$Raghda Hraiz– Raghda Hraiz2014年07月22日 19:12:24 +00:00Commented Jul 22, 2014 at 19:12
-
\$\begingroup\$ @RaghdaHraiz Edit your question code, but I think you still had issues with Scope of variables \$\endgroup\$Malachi– Malachi2014年07月22日 19:14:13 +00:00Commented Jul 22, 2014 at 19:14
-
\$\begingroup\$ regarding the while statement, it should be inside the loop and in the else statement. the else which print the error message is related to this if statement if len(header.split(":"))>=2: @Malachi \$\endgroup\$Raghda Hraiz– Raghda Hraiz2014年07月22日 19:15:38 +00:00Commented Jul 22, 2014 at 19:15
-
1\$\begingroup\$ I edited the code indentation and added comments ..could you check it now @Malachi \$\endgroup\$Raghda Hraiz– Raghda Hraiz2014年07月22日 19:30:40 +00:00Commented Jul 22, 2014 at 19:30
-
1\$\begingroup\$ @RaghdaHraiz: Please be aware that code edits based on answers are normally disallowed, but this is a somewhat different case. If someone mentions other changes that you weren't aware of, then the original code must stay intact. \$\endgroup\$Jamal– Jamal2014年07月22日 19:36:59 +00:00Commented Jul 22, 2014 at 19:36
A few brief comments:
- Use four spaces for each indentation level
- Use a space around each operator (
==
,>=
, ...) - Use the
in
operator instead of thehas_key
method - Use
subprocess.Popen
instead ofos.system
- Use
x.startswith(y)
(returns a boolean directly) instead ofx.find(y) == 0
A few longer comments:
- I'm not sure what is the logic regarding filenames that you need to implement, but I recommend to have a look at the
fnmatch
module. - For the parsing of the HTTP fields, you might want to use a regular expression.
- Also, a comment to make clear when a request or a response is being parsed would make the code more readable.
- Rename the
i
variable to make clear what is being used for (isheaders[i]
supposed to be the same asheader
?). - Do not reinvent the wheel unless you need to. Check if there's an HTTP parsing library around already.
-
\$\begingroup\$ why the in operator is better than has_key and subprocess.Popen is better than os.system @jcollado \$\endgroup\$Raghda Hraiz– Raghda Hraiz2014年07月23日 09:22:25 +00:00Commented Jul 23, 2014 at 9:22
-
\$\begingroup\$ According to the documentation has_key has been deprecated (
in
is more generic and can be used with user defined classes that implement the__contains__
method) and os.system is not as powerful assubprocess.Popen
. There's a section about how to usesubprocess.Popen
instead ofos.system
here. \$\endgroup\$jcollado– jcollado2014年07月23日 10:38:44 +00:00Commented Jul 23, 2014 at 10:38
main()
function, this is not C, just put everything that yourmain()
function does under theif __name__ == '__main__':
it works like C. \$\endgroup\$main()
function is not entirely a bad idea, I think. \$\endgroup\$a = 2; b = 3; c = a + b;
if I wont needa
andb
anymore, just need toc
be equals to5
... \$\endgroup\$