2

Level: Beginner. I am currently working on sniffers with python using raw sockets. I have a general question regarding the format specifiers to be used in unpack() provided by struct module. As we use this method to unpack the data according to the format specifiers specified. I have seen a lot of sniffer programmes using unpack() to decode the packet information from hexadecimal form. for an example in order to extract the ethernet header information following code can be used:

ethHeader = struct.unpack("!6s6s2s", ethernetHeader)

Here the ethernetHeader is variable that contains the actual ethernet header data captured earlier from a raw socket. Now my questions is how can one know which format specifier to use for a header? How can I know in advance that the ethernet addresses are in string format or in some other format? Is there any documentation for this too. I read python docs related to unpack() but didn't find any info. Similarly in the case of IP addresses the code is something like this:

ipAddresses = struct.unpack("!12s4s4s", IPAddresses)

Here the IPAddresses is variable that contains the actual IP addresses information captured earlier from a raw socket. Once again how can I know that I have to use strings as format specifiers (!12s4s4s). Thanks.

asked Dec 25, 2013 at 0:32
8
  • 2
    Did you consider the possibility to use scapy? It will simplify your life a lot. Commented Dec 25, 2013 at 0:38
  • 2
    The internet and knowing what you're doing. In the case of Ethernet, reading the 802.3 spec would be ideal, but Wikipedia will do in a pinch. Commented Dec 25, 2013 at 0:41
  • @Faust I shall do scapy later. I am new to python and I am learning things by my own in a slow way. :) Commented Dec 25, 2013 at 0:55
  • @hobbs I tried searching in internet, and still doing so. Unfortunately I have nothing new yet. Commented Dec 25, 2013 at 1:02
  • 1
    @eyquem how do you expect either of those things to help with sniffing? Commented Dec 25, 2013 at 5:15

2 Answers 2

5

Thanks to J.F. Sebastian for a hint. I finally figured it out and will take some time to explain it here. Normally we have to look for the C type in the struct of every headers to know what C types are being used for each of fields in different headers of a packet. Then later we can use this table to know that which format specifier will represent which C type. For example in case of IP header the struct is as given below:

struct ipheader {
 unsigned char ip_hl:4, ip_v:4; /* this means that each member is 4 bits */
 unsigned char ip_tos;
 unsigned short int ip_len;
 unsigned short int ip_id;
 unsigned short int ip_off;
 unsigned char ip_ttl;
 unsigned char ip_p;
 unsigned short int ip_sum;
 unsigned int ip_src;
 unsigned int ip_dst;
}; 

For an eg: unsigned char are represented as 'B' and unsigned int is represented by 'I'. Now we can use this method to know what format specifiers should be used in struct.unpack() to get the field values of a IP header. In case of a IP header it becomes as following:

struct.unpack('!BBHHHBBHII')

But you shall notice that most of the programme uses struct.unpack('!BBHHHBBH4s4s').

So the question arises why in case of unsigned int ip_src; & unsigned int ip_dst; 's' is used instead of 'I' as a format specifier in struct.unpack(). The reason is the if 'I' is used as a format specifier then the unpack() method returns the IP addresses in form of an integer form (eg: 3232267778). Then you have to covert it to actual IP address form (eg: 10.0.0.1). Usually in the sniffer programmes that are available on internet simply use socket.inet_ntoa() for obtaining the actual ip addresses. This method accept a string type and not an integer type. So that is the reason why in case of unsigned int ip_src; & unsigned int ip_dst; 's' is used instead of 'I' as a format specifier in struct.unpack() so that the result can be later fed to socket.inet_ntoa() to obtain the IP address in actual IP address format. Similarly in the case for ethernet header. We use 's' instead of 'B' in struct.unpack() because we need a string that can be later fed to binascii.hexlify() in order to get the MAC in actual MAC address format.

answered Dec 27, 2013 at 12:17
Sign up to request clarification or add additional context in comments.

2 Comments

hello, why does the ipheader struct doesn't contain fields like the 2bits fragment flags, or the unused bits in the header ? And how do you know how to format the header ? Like there's at least 14 fields in the ip header, but there's only 10 letters in the format string
Should be >, not !. IPv4 packet headers use BE byte order.
2

struct.unpack allows you to convert a sequence of bytes that contains C types specified in the format (the first argument) into corresponding Python objects (integer, float, string).

It is generic.

how can one know which format specifier to use for a header? How can I know in advance that the ethernet addresses are in string format or in some other format? Is there any documentation for this too. I read python docs related to unpack() but didn't find any info.

struct module knows nothing about formats your application might need. It is specific to your application i.e., in this case it is about TCP/IP suite, protocols, sniffers and networking. Read about it to figure out what C types to expect in ethernetHeader, IPAddresses, etc and then create appropriate format string using this table.

answered Dec 25, 2013 at 21:02

2 Comments

minor note, a cite from docs, about prefix !: The form '!' is available for those poor souls who claim they can’t remember whether network byte order is big-endian or little-endian.
@J.F. Sebastian Although the C types in headers will surely help in selecting the format specifiers for unpack(). But this method doesn't really answered my question completely. For eg. The struct for ethernet header uses unsigned char for MAC addresses so according to your answer they should be converted to 'B' instead of 's' using the table you mentioned. However ultimately I figured it out myself why it is 's' instead of 'B' in case of ethernet header and why it is 's' instead of 'I' in case of IP addresses in python's unpack(). I select your answer because of a constructive attempt.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.