A source Code Counter

Question 1

This is from my project codecount I use personally that is very similar to cloc.exe or SLOCCount. The part that I am questioning is where I am calculating when I am in a comment block and have deep nesting, I basically am replacing the sections with blank. I would like to deal with fixed format files at some point (RPGLE, CLP, etc) where the comments need to have characters in a specific position.

Any review/comments/suggestions would be appreciated.

def scanfile(self, filename):
 """ The heart of the codecounter, Scans a file to identify
 and collect the metrics based on the classification. """
 strblock = None
 endblock = None
 inblock = False
 endcode = False
 sha256 = hashlib.sha256()
 if(filename.size == 0):
 logging.info("Skipping File : " + filename.name)
 return
 # Identify language
 for l in self.langs:
 if(filename.extension in self.langs[l]["ext"]):
 filename.lang = l
 break
 # Unknown files don't need processed
 if(filename.lang is None):
 logging.info("Skipping File : " + filename.name)
 return
 # Using the with file opening in order to ensure no GC issues.
 with open(filename.path + "/" + filename.name,
 encoding="utf-8", errors='ignore') as fp:
 for line in fp:
 sha256.update(line.encode("utf-8"))
 filename.lines += 1
 line = line.strip()
 identified = False
 if(line == ""):
 logging.info(" blak " + str(filename.lines))
 filename.blanks += 1
 continue
 if(endcode):
 filename.comments += 1
 continue
 # Check to see if it is a block or was an opening block
 # ex1 = "/* */ if x;" = Code, not inblock
 # ex2 = "*/ if x; /*" = Code, inblock
 # ex3 = " if x; /*" = Code, inblock
 # ex4 = "/* */ if x; /* */ .." = Code, not inblock
 # ex4 = "*/" = Comment, not inblock
 # ex5 = "/* */" = Comment, not inblock
 # ex6 = "/*" = Comment, inblock
 # Two scenarios,
 # 1 - comments removed, code remains
 # 2 - Comments removed but block is open
 if(not inblock):
 for token in self.langs[filename.lang]["bcomm"]:
 strblock = token
 endblock = self.langs[filename.lang]["bcomm"][token]
 while token in line:
 # If a block has started then check for an exit
 if(endblock in line):
 line = line.replace(
 line[line.find(strblock):
 line.find(endblock)
 + len(endblock)], "", 1)
 else:
 line = line.replace(
 line[line.find(strblock):], "", 1)
 inblock = True # left open
 else:
 # Continue until the block ends... when left open
 if(endblock in line):
 inblock = False # End the block
 line = line.replace(
 line[:line.find(endblock) + len(endblock)],
 "").strip()
 else:
 line = ""
 # From the block but no hidden code made it out the back....
 if(line is ""):
 logging.info(" bloc " + str(filename.lines) + line)
 filename.comments += 1
 continue
 # Check line comment designators
 for token in self.langs[filename.lang]["comment"]:
 if(line.startswith(token)):
 logging.info(" line " + str(filename.lines) + line)
 filename.comments += 1
 identified = True
 break
 if(identified):
 continue
 # If not a blank or comment it must be code
 logging.info(" code " + str(filename.lines) + line)
 filename.code += 1
 # Check for the ending of code statements
 for end in self.langs[filename.lang]["endcode"]:
 if(line == end):
 endcode = True
 # Store the hash of this file for comparison to others
 logging.info("Total "
 + " " + str(filename.blanks)
 + " " + str(filename.comments)
 + " " + str(filename.code))
 filename.sha256 = sha256.digest()

Question 2

A couple of points:

In Python there is no need to place parentheses after if. For example,

if(filename.size == 0):

Can be replaced with

if filename.size == 0:

In fact, the latter is the preferred style.

Also, using format is preferred to using + to append strings. For example,

logging.info("Total "
 + " " + str(filename.blanks)
 + " " + str(filename.comments)
 + " " + str(filename.code))

Can be replaced with:

log_message = 'Total {blanks} {comments} {code}'.format(
 blanks=filename.blanks,
 comments=filename.comments,
 code=filename.code)
logging.info(log_message)

This has a few advantages: more readable, easier to internationalize (one single string token instead of multiple), has descriptive named place holders, and finally, will allow for easier formatting if you decide to add it at some point. For example, you can change {comments} to {comments:05d} to format it as a fixed-width number with width 5 and zeros padded on the left if necessary.

Question 3

FWIW, you can also use 'Total {fname.blanks} {fname.comments} {fname.code}'.format(fname=filename).

Question 4

Instead of having to update the number of lines each iteration through the file, you could use the enumerate function and only assign the line number once:

with open('test_file.txt', 'r') as f:
 for line_num, line in enumerate(f):
 # Insert parsing code here
 pass
 # Assign line amount here. A NameError error will be thrown
 # if this code is run on a file with no lines. If this errors,
 # assign 0.
 try:
 filename.lines = line_num + 1
 except NameError:
 filename.lines = 0

Also, when creating filepaths, look into using os.path.join. This creates the normalized version of the desired filepath for the current OS:

>>> import os
>>> os.path.join('some_path', 'to_my', 'file.txt')
'some_path\\to_my\\file.txt'

Question 5

On the enumerate, I thought about that but thought it would be wrong if the file happens to have no lines.? I think your right on the os.path.join, I should switch that, got bit by that once before. Thanks.

Question 6

If the code is ran with a file with no lines, line_num will be undefined. I will update accordingly.

Sahand Sahand 7921 gold badge8 silver badges10 bronze badges · Accepted Answer · 2014-04-16 02:53:47Z

A couple of points:

In Python there is no need to place parentheses after if. For example,

if(filename.size == 0):

Can be replaced with

if filename.size == 0:

In fact, the latter is the preferred style.

Also, using format is preferred to using + to append strings. For example,

logging.info("Total "
 + " " + str(filename.blanks)
 + " " + str(filename.comments)
 + " " + str(filename.code))

Can be replaced with:

log_message = 'Total {blanks} {comments} {code}'.format(
 blanks=filename.blanks,
 comments=filename.comments,
 code=filename.code)
logging.info(log_message)

This has a few advantages: more readable, easier to internationalize (one single string token instead of multiple), has descriptive named place holders, and finally, will allow for easier formatting if you decide to add it at some point. For example, you can change {comments} to {comments:05d} to format it as a fixed-width number with width 5 and zeros padded on the left if necessary.

FWIW, you can also use 'Total {fname.blanks} {fname.comments} {fname.code}'.format(fname=filename).

Stack Exchange Network

A source Code Counter

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

A source Code Counter

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions