Splitting a CAN bus log in .asc format

Question 1

I've written a quick script for a coworker to split a large CAN log into smaller chunks. (If you're not familiar with CAN, it's a communication protocol used by the ECUs in many cars.) I know where to split because I've inserted dummy CAN messages (with ID 0x00) at the start of each section, and one at the end of testing (which may be somewhere in the middle of the log) to tell me when to stop reading.

The log is in .asc or .csv format, and can be several gigabytes in size. Currently I can process a 1.5GB file in about 40 seconds, but I'm sure that can be improved. I'm looking more for advice on how to speed this up than to make it more Pythonic, but of course criticism is welcome in both areas.

Note: titles is a dictionary mapping section numbers to a particular string that needs to be added to the filename before saving. I can add the code for generating these, but I don't believe it's as relevant.

def split_asc_file(target_file, target_dir, titles):
 import os
 import time
 start_time = time.time()
 if not os.path.isdir(target_dir):
 os.mkdir(target_dir)
 os.chdir(target_dir)
 section = None
 def create_title(message_string):
 req_num = int(message_string[0:8])
 obj_num = int(message_string[8:16])
 if req_num == 0 and obj_num == 0:
 print "Splitting completed in {} seconds".format(time.time() - start_time)
 quit() # final test has been executed
 else:
 at = "AT{}_{}".format(req_num, obj_num)
 title_prefix = titles[at]
 title_string = "{}_{}.asc".format(title_prefix, at)
 return title_string
 def can_traffic_only(f):
 # iterate only over lines that contain messages
 for line in f:
 if len(line.split()) == 14:
 yield line
 with open(target_file) as log:
 print "Opening {}...".format(target_file)
 for message in can_traffic_only(log):
 values = message.split()
 can_id = values[2]
 can_data = "".join(values[6:])
 if can_id == "0":
 if section:
 section.close()
 title = create_title(can_data)
 if title:
 print "Creating {}".format(title)
 section = open(title, "w")
 else:
 if section:
 section.write(message)
 print "Splitting completed in {} seconds".format(time.time() - start_time)

Question 2

in can_traffic_only you split a line and check for the number of parts and in the other part you split that line again. can_traffic_only could return the list of parts so that the second split can be eliminated.

Question 3

I originally had it this way, but I realized I needed the complete message for the write here: section.write(message)

Question 4

Oops, I missed that, but of course you could return both.

Question 5

What I get from your code is that, you skip messages until the first dummy one which indicate the first section and then you have the following cycle:

Extract title information out of the dummy message;
Open a file to extract out messages of this section into it;
Write relevant messages until the next dummy one.

Reorganizing your code to follow this layout more closely can lead you to remove you if section tests which are executed at each line and may be slowing thing a bit.

You can also remove your if title since create_title will never return anything other than a string of more than 5 characters. But I guess that it was used before to check for the end of the tests and I’ll reuse that.

By combining that with proposals by @ferada, you can end up with:

import os
import time
def create_title(message_string, titles):
 req_num = int(message_string[0:8])
 obj_num = int(message_string[8:16])
 if not req_num and not obj_num:
 return
 at = "AT{}_{}".format(req_num, obj_num)
 title_prefix = titles[at]
 return "{}_{}.asc".format(title_prefix, at)
def split_asc_file(target_file, target_dir, titles): 
 if not os.path.isdir(target_dir):
 os.makedirs(target_dir)
 os.chdir(target_dir)
 with open(target_file) as log:
 print 'Opening', target_file
 # Bootstrap
 for message in log:
 data = message.split()
 if len(data) == 14 and data[2] == "0":
 break
 while True:
 # Using message rather than reusing data here; see next comment
 data = message.split()[6:]
 title = create_title(''.join(data), titles)
 if title is None:
 break
 with open(title, 'w') as section:
 print 'Created', title
 for message in log:
 # Knowing the input format, you should be able to extract
 # the same information than the next two ifs by analyzing
 # message rather than splitting it, as ferada suggested
 data = message.split()
 if len(data) == 14:
 if data[2] == "0":
 break
 section.write(message)
if __name__ == '__main__':
 start_time = time.time()
 split_asc_file(..,..,..) #Whatever
 print "Splitting completed in {} seconds".format(time.time() - start_time)

The workflow I proposed let you also open the section file using a with statement which is prefered in python. I also changed mkdir in makedirs, just in case.

Question 6

That looks like it's close to what you're going to get with Python I think.

I'd suggest taking a profiler and optimising according to that; e.g. I can imagine that doing less work using split and instead just counting the number of spaces (instead of allocating all the results) should be a bit faster (in can_traffic_only).

can_data can be delayed till the condition for the if block is true, but again, depends on how often that's the case.

If there's nothing else you could inline can_traffic_only and see if that makes a difference.

score 3 · Accepted Answer · 2016-06-26 09:48:27Z

What I get from your code is that, you skip messages until the first dummy one which indicate the first section and then you have the following cycle:

Extract title information out of the dummy message;
Open a file to extract out messages of this section into it;
Write relevant messages until the next dummy one.

Reorganizing your code to follow this layout more closely can lead you to remove you if section tests which are executed at each line and may be slowing thing a bit.

You can also remove your if title since create_title will never return anything other than a string of more than 5 characters. But I guess that it was used before to check for the end of the tests and I’ll reuse that.

By combining that with proposals by @ferada, you can end up with:

import os
import time
def create_title(message_string, titles):
 req_num = int(message_string[0:8])
 obj_num = int(message_string[8:16])
 if not req_num and not obj_num:
 return
 at = "AT{}_{}".format(req_num, obj_num)
 title_prefix = titles[at]
 return "{}_{}.asc".format(title_prefix, at)
def split_asc_file(target_file, target_dir, titles): 
 if not os.path.isdir(target_dir):
 os.makedirs(target_dir)
 os.chdir(target_dir)
 with open(target_file) as log:
 print 'Opening', target_file
 # Bootstrap
 for message in log:
 data = message.split()
 if len(data) == 14 and data[2] == "0":
 break
 while True:
 # Using message rather than reusing data here; see next comment
 data = message.split()[6:]
 title = create_title(''.join(data), titles)
 if title is None:
 break
 with open(title, 'w') as section:
 print 'Created', title
 for message in log:
 # Knowing the input format, you should be able to extract
 # the same information than the next two ifs by analyzing
 # message rather than splitting it, as ferada suggested
 data = message.split()
 if len(data) == 14:
 if data[2] == "0":
 break
 section.write(message)
if __name__ == '__main__':
 start_time = time.time()
 split_asc_file(..,..,..) #Whatever
 print "Splitting completed in {} seconds".format(time.time() - start_time)

The workflow I proposed let you also open the section file using a with statement which is prefered in python. I also changed mkdir in makedirs, just in case.

Stack Exchange Network

Splitting a CAN bus log in .asc format

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Splitting a CAN bus log in .asc format

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions