7
\$\begingroup\$

This is from my project codecount I use personally that is very similar to cloc.exe or SLOCCount. The part that I am questioning is where I am calculating when I am in a comment block and have deep nesting, I basically am replacing the sections with blank. I would like to deal with fixed format files at some point (RPGLE, CLP, etc) where the comments need to have characters in a specific position.

Any review/comments/suggestions would be appreciated.

def scanfile(self, filename):
 """ The heart of the codecounter, Scans a file to identify
 and collect the metrics based on the classification. """
 strblock = None
 endblock = None
 inblock = False
 endcode = False
 sha256 = hashlib.sha256()
 if(filename.size == 0):
 logging.info("Skipping File : " + filename.name)
 return
 # Identify language
 for l in self.langs:
 if(filename.extension in self.langs[l]["ext"]):
 filename.lang = l
 break
 # Unknown files don't need processed
 if(filename.lang is None):
 logging.info("Skipping File : " + filename.name)
 return
 # Using the with file opening in order to ensure no GC issues.
 with open(filename.path + "/" + filename.name,
 encoding="utf-8", errors='ignore') as fp:
 for line in fp:
 sha256.update(line.encode("utf-8"))
 filename.lines += 1
 line = line.strip()
 identified = False
 if(line == ""):
 logging.info(" blak " + str(filename.lines))
 filename.blanks += 1
 continue
 if(endcode):
 filename.comments += 1
 continue
 # Check to see if it is a block or was an opening block
 # ex1 = "/* */ if x;" = Code, not inblock
 # ex2 = "*/ if x; /*" = Code, inblock
 # ex3 = " if x; /*" = Code, inblock
 # ex4 = "/* */ if x; /* */ .." = Code, not inblock
 # ex4 = "*/" = Comment, not inblock
 # ex5 = "/* */" = Comment, not inblock
 # ex6 = "/*" = Comment, inblock
 # Two scenarios,
 # 1 - comments removed, code remains
 # 2 - Comments removed but block is open
 if(not inblock):
 for token in self.langs[filename.lang]["bcomm"]:
 strblock = token
 endblock = self.langs[filename.lang]["bcomm"][token]
 while token in line:
 # If a block has started then check for an exit
 if(endblock in line):
 line = line.replace(
 line[line.find(strblock):
 line.find(endblock)
 + len(endblock)], "", 1)
 else:
 line = line.replace(
 line[line.find(strblock):], "", 1)
 inblock = True # left open
 else:
 # Continue until the block ends... when left open
 if(endblock in line):
 inblock = False # End the block
 line = line.replace(
 line[:line.find(endblock) + len(endblock)],
 "").strip()
 else:
 line = ""
 # From the block but no hidden code made it out the back....
 if(line is ""):
 logging.info(" bloc " + str(filename.lines) + line)
 filename.comments += 1
 continue
 # Check line comment designators
 for token in self.langs[filename.lang]["comment"]:
 if(line.startswith(token)):
 logging.info(" line " + str(filename.lines) + line)
 filename.comments += 1
 identified = True
 break
 if(identified):
 continue
 # If not a blank or comment it must be code
 logging.info(" code " + str(filename.lines) + line)
 filename.code += 1
 # Check for the ending of code statements
 for end in self.langs[filename.lang]["endcode"]:
 if(line == end):
 endcode = True
 # Store the hash of this file for comparison to others
 logging.info("Total "
 + " " + str(filename.blanks)
 + " " + str(filename.comments)
 + " " + str(filename.code))
 filename.sha256 = sha256.digest()
asked Apr 16, 2014 at 2:24
\$\endgroup\$

2 Answers 2

10
\$\begingroup\$

A couple of points:

In Python there is no need to place parentheses after if. For example,

if(filename.size == 0):

Can be replaced with

if filename.size == 0:

In fact, the latter is the preferred style.

Also, using format is preferred to using + to append strings. For example,

logging.info("Total "
 + " " + str(filename.blanks)
 + " " + str(filename.comments)
 + " " + str(filename.code))

Can be replaced with:

log_message = 'Total {blanks} {comments} {code}'.format(
 blanks=filename.blanks,
 comments=filename.comments,
 code=filename.code)
logging.info(log_message)

This has a few advantages: more readable, easier to internationalize (one single string token instead of multiple), has descriptive named place holders, and finally, will allow for easier formatting if you decide to add it at some point. For example, you can change {comments} to {comments:05d} to format it as a fixed-width number with width 5 and zeros padded on the left if necessary.

answered Apr 16, 2014 at 2:53
\$\endgroup\$
1
  • 2
    \$\begingroup\$ FWIW, you can also use 'Total {fname.blanks} {fname.comments} {fname.code}'.format(fname=filename). \$\endgroup\$ Commented Apr 16, 2014 at 4:42
7
\$\begingroup\$

Instead of having to update the number of lines each iteration through the file, you could use the enumerate function and only assign the line number once:

with open('test_file.txt', 'r') as f:
 for line_num, line in enumerate(f):
 # Insert parsing code here
 pass
 # Assign line amount here. A NameError error will be thrown
 # if this code is run on a file with no lines. If this errors,
 # assign 0.
 try:
 filename.lines = line_num + 1
 except NameError:
 filename.lines = 0

Also, when creating filepaths, look into using os.path.join. This creates the normalized version of the desired filepath for the current OS:

>>> import os
>>> os.path.join('some_path', 'to_my', 'file.txt')
'some_path\\to_my\\file.txt'
answered Apr 21, 2014 at 15:32
\$\endgroup\$
2
  • \$\begingroup\$ On the enumerate, I thought about that but thought it would be wrong if the file happens to have no lines.? I think your right on the os.path.join, I should switch that, got bit by that once before. Thanks. \$\endgroup\$ Commented Apr 22, 2014 at 15:02
  • \$\begingroup\$ If the code is ran with a file with no lines, line_num will be undefined. I will update accordingly. \$\endgroup\$ Commented Apr 22, 2014 at 15:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.