I just completed my first real application (command line app). It's simple, but I had no prior knowledge of Python. Everything was hit and miss, with many Python books and help from those in this community to guide me. Now, I am trying to see where I can refine my code and make things more efficient, as well as become a more efficient Python programmer. I would appreciate any feedback on what I could change, add or eliminate.
#!c:\Python27
""" This is a simple command line tool used to extract .tgz and .tar files
and perform 3 different type of searches (single file w/ user string,
multiple file w/ user string (converted into ascii, hex), and multiple
file with re pattern) of .log and .txt files. It also performs ascii,
hex, and octal conversion"""
import cmd
import os
import sys, tarfile
import time
import datetime
import re
import string
import fileinput
import os.path
class SQAST(cmd.Cmd):
from time import strftime
#os.system("clear")
print strftime("%Y-%m-%d %H:%M:%S")
print " *****************************"
print " * Company Name *"
print " *****************************"
prompt = 'sqa@gilbarco >> '
intro = " Security Tool "
doc_header = 'Attributes'
misc_header = 'misc_header'
undoc_header = 'Commands'
ruler = '-'
COMMANDS = ['ascii', 'files', 'help', 'hex', 'path', 'print', 'exit', 'search']
def do_prompt(self, line):
"Change the interactive prompt"
self.prompt = line + ': '
def do_hex(self, line):
text=raw_input("Enter string to convert: ") #Add exception error handling
print "HEX: " + text.encode('hex')
def do_octal(self, line):
text=raw_input("Enter string to convert: ") #Add exception error handling
condint = re.compile(r'[0-9]+')
if condint.match(text):
convtint = int(text)
print "OCTAL: " + oct(convtint)
else:
print "YOU ENTERED AN INVALID CHARACTER/s!"
def do_ascii(self, line):
text=raw_input("Enter string to convert: ").strip() #Add exception error handling
print "ASCI: " + ' '.join(str(ord(char)) for char in text)
def do_files(self, line):
path=os.getcwd() #Set log path here
dirlist=os.listdir(path)
for fname in dirlist:
if fname.endswith(('.log','.txt')):
print '\n'
print fname
def do_movefiles(self, line):
defaultFolder = "Log_Text_Files"
if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted \Log_Text_Files'):
os.mkdir('C:\\Extracted\\Log_Text_Files')
print 'ALL FILES WILL BE MOVED TO c:\Extracted\Log_Text_Files!\n'
chPath = raw_input('\nEnter root folder to move all .log and .txt files from ... (eg. c:\Extracted\NameofFolder):\n')
try:
os.chdir(chPath)
except:
print "YOU ENTERED INVALID DATA OR AN INVALID PATH ... TRY AGAIN!!!"
def print_tgzLogs (arg, dir, files):
for file in files:
path = os.path.join (dir, file)
path = os.path.normcase (path)
if path.endswith(".txt") or path.endswith(".log"):
if os.path.exists('C:\\Extracted\\Log_Text_Files\\%s' % file):
os.remove('C:\\Extracted\\Log_Text_Files\\%s' % file)
os.rename(path, 'C:\\Extracted\\Log_Text_Files\\%s' % file)
print path
os.path.walk(os.getcwd(), print_tgzLogs, 0)
def do_path(self, line):
print "Current Path: " + os.getcwd()
print " "
print "(Eg. Windows path: c:\Python, Linux path: /var/opt/crindbios)"
print " "
changePath = raw_input("Enter new path: ")
try:
os.chdir(changePath)
except:
print "YOU ENTERED AN INVALID PATH ... exiting path!"
#os.chdir(changePath)
print "Current Path: " + os.getcwd()
def do_fileExtract(self, line):
print "If a CRC error occurs, this usually means that the archived file or some parts thereof is corrupt!\n"
defaultFolder = "Extracted"
if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted'):
os.mkdir('c:\\Extracted')
raw_input("PLACE .tgz FILES in c:\Extracted AT THIS TIME!!! PRESS ENTER WHEN FINISHED!")
else:
pass
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
userpath = "Extracted"
directory = os.path.join("c:\\", userpath)
os.chdir(directory)
path=os.getcwd() #Set log path here
dirlist=os.listdir(path)
files = [fname for fname in os.listdir(path)
if fname.endswith(('.tgz','.tar'))]
for item in enumerate(files):
print "%d- %s" % item
try:
idx = int(raw_input("\nEnter the file's number:\n"))
except ValueError:
print "You fail at typing numbers."
try:
chosen = files[idx]
except IndexError:
print "Try a number in range next time."
file_extensions = ('tar', 'tgz')
# Edit this according to the archive types you want to extract. Keep in
# mind that these should be extractable by the tarfile module.
def FileExtension(file_name):
"""Return the file extension of file
'file' should be a string. It can be either the full path of
the file or just its name (or any string as long it contains
the file extension.)
Examples:
input (file) --> 'abc.tar'
return value --> 'tar'
"""
match = re.compile(r"^.*[.](?P<ext>\w+)$",
re.VERBOSE|re.IGNORECASE).match(file_name)
if match: # if match != None:
ext = match.group('ext')
return ext
else:
return '' # there is no file extension to file_name
def AppropriateFolderName(folder_name, parent_fullpath):
"""Return a folder name such that it can be safely created in
parent_fullpath without replacing any existing folder in it.
Check if a folder named folder_name exists in parent_fullpath. If no,
return folder_name (without changing, because it can be safely created
without replacing any already existing folder). If yes, append an
appropriate number to the folder_name such that this new folder_name
can be safely created in the folder parent_fullpath.
Examples:
folder_name = 'untitled folder'
return value = 'untitled folder' (if no such folder already exists
in parent_fullpath.)
folder_name = 'untitled folder'
return value = 'untitled folder 1' (if a folder named 'untitled folder'
already exists but no folder named
'untitled folder 1' exists in
parent_fullpath.)
folder_name = 'untitled folder'
return value = 'untitled folder 2' (if folders named 'untitled folder'
and 'untitled folder 1' both
already exist but no folder named
'untitled folder 2' exists in
parent_fullpath.)
"""
if os.path.exists(os.path.join(parent_fullpath,folder_name)):
match = re.compile(r'^(?P<name>.*)[ ](?P<num>\d+)$').match(folder_name)
if match: # if match != None:
name = match.group('name')
number = match.group('num')
new_folder_name = '%s %d' %(name, int(number)+1)
return AppropriateFolderName(new_folder_name,
parent_fullpath)
# Recursively call itself so that it can be check whether a
# folder named new_folder_name already exists in parent_fullpath
# or not.
else:
new_folder_name = '%s 1' %folder_name
return AppropriateFolderName(new_folder_name, parent_fullpath)
# Recursively call itself so that it can be check whether a
# folder named new_folder_name already exists in parent_fullpath
# or not.
else:
return folder_name
def Extract(tarfile_fullpath, delete_tar_file=True):
"""Extract the tarfile_fullpath to an appropriate* folder of the same
name as the tar file (without an extension) and return the path
of this folder.
If delete_tar_file is True, it will delete the tar file after
its extraction; if False, it won`t. Default value is True as you
would normally want to delete the (nested) tar files after
extraction. Pass a False, if you don`t want to delete the
tar file (after its extraction) you are passing.
"""
tarfile_name = os.path.basename(tarfile_fullpath)
parent_dir = os.path.dirname(tarfile_fullpath)
extract_folder_name = AppropriateFolderName(tarfile_name[:\
-1*len(FileExtension(tarfile_name))-1], parent_dir)
# (the slicing is to remove the extension (.tar) from the file name.)
# Get a folder name (from the function AppropriateFolderName)
# in which the contents of the tar file can be extracted,
# so that it doesn't replace an already existing folder.
extract_folder_fullpath = os.path.join(parent_dir,
extract_folder_name)
# The full path to this new folder.
try:
tar = tarfile.open(tarfile_fullpath)
tar.extractall(extract_folder_fullpath)
tar.close()
if delete_tar_file:
os.remove(tarfile_fullpath)
return extract_folder_name
except Exception as e:
# Exceptions can occur while opening a damaged tar file.
print 'Error occured while extracting %s\n'\
'Reason: %s' %(tarfile_fullpath, e)
return
def WalkTreeAndExtract(parent_dir):
"""Recursively descend the directory tree rooted at parent_dir
and extract each tar file on the way down (recursively).
"""
try:
dir_contents = os.listdir(parent_dir)
except OSError as e:
# Exception can occur if trying to open some folder whose
# permissions this program does not have.
print 'Error occured. Could not open folder %s\n'\
'Reason: %s' %(parent_dir, e)
return
for content in dir_contents:
content_fullpath = os.path.join(parent_dir, content)
if os.path.isdir(content_fullpath):
# If content is a folder, walk it down completely.
WalkTreeAndExtract(content_fullpath)
elif os.path.isfile(content_fullpath):
# If content is a file, check if it is a tar file.
# If so, extract its contents to a new folder.
if FileExtension(content_fullpath) in file_extensions:
extract_folder_name = Extract(content_fullpath)
if extract_folder_name: # if extract_folder_name != None:
dir_contents.append(extract_folder_name)
# Append the newly extracted folder to dir_contents
# so that it can be later searched for more tar files
# to extract.
else:
# Unknown file type.
print 'Skipping %s. <Neither file nor folder>' % content_fullpath
if __name__ == '__main__':
tarfile_fullpath = chosen # pass the path of your tar file here.
extract_folder_name = Extract(tarfile_fullpath, False)
# tarfile_fullpath is extracted to extract_folder_name. Now descend
# down its directory structure and extract all other tar files
# (recursively).
extract_folder_fullpath = os.path.join(os.path.dirname(tarfile_fullpath),
extract_folder_name)
WalkTreeAndExtract(extract_folder_fullpath)
# If you want to extract all tar files in a dir, just execute the above
# line and nothing else.
"""try:
extract(chosen)
print 'Done'
except:
print ' '"""
def complete_files(self, text, line, begidx, endix):
if not text:
completions = self.COMMANDS[:]
else:
completions = [f
for f in self.COMMANDS
if f.startswith(text)]
return completions
def do_search(self, line):
print '\nCurrent dir: '+ os.getcwd()
print '\nALL LOG FILES WILL BE CREATED IN: c:\Temp_log_files!'
userpath = raw_input("\nPlease enter a path to search (only enter folder name, eg. SQA\log): ")
try:
directory = os.path.join("c:\\",userpath)
os.chdir(directory)
except:
print "Path name does not exist!!!"
print "\n SEARCHES ARE CASE SENSITIVE"
print " "
line = "[1]Single File [2]Multiple Files [3]STATIC HEX"
col1 = line[0:14]
col2 = line[15:32]
col3 = line[33:46]
print " " + col1 + " " + col2 + " " + col3
print '\n'
print "\nCurrent Dir: " + os.getcwd()
searchType = raw_input("\nSelect type of search: ")
if searchType == '1':
print "\nFile result1.log will be ceated in c:\Temp_log_files."
print "\nFILES:\n"
#self.do_files(89)
files = [fname for fname in os.listdir(os.getcwd())
if fname.endswith(('.txt','.log'))]
for item in enumerate(files):
print "%d- %s" % item
try:
idx = int(raw_input("\nEnter the file's number:\n"))
except ValueError:
print "You fail at typing numbers."
try:
chosen = files[idx]
except IndexError:
print "Try a number in range next time."
paths = "c:\\Temp_log_files\\result1.log"
temp = file(paths, "w")
#logfile = raw_input("\nEnter filename to search (eg. name.log): ")
logfile = chosen
try:
fiLe = open(logfile, "r")
except:
print "Filename does not exist!!!"
userString = raw_input("\nEnter a string name to search: ")
for i,line in enumerate(fiLe.readlines()):
if userString in line:
ln = str(i)
print "String: " + userString
print "File: " + os.path.join(directory,logfile)
print "Line: " + ln
temp.write('\nLine:' + ln + '\nString:' + userString + '\nFile: ' + os.path.join(directory,logfile) + '\n')
break
else:
print "%s NOT in %s" % (userString, logfile)
fiLe.close()
elif searchType =='2':
print "\nDirectory to be searched: " + directory
print "\nFile result2.log will be created in: c:\Temp_log_files."
paths = "c:\\Temp_log_files\\result2.log"
temp = file(paths, "w")
userstring = raw_input("Enter a string name to search: ")
userStrHEX = userstring.encode('hex')
userStrASCII = ''.join(str(ord(char)) for char in userstring)
regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape( userStrHEX ), re.escape( userStrASCII )))
goby = raw_input("Press Enter to begin search (search ignores whitespace)!\n")
for root,dirname, files in os.walk(directory):
for file1 in files:
if file1.endswith(".log") or file1.endswith(".txt"):
f=open(os.path.join(root, file1))
for i,line in enumerate(f.readlines()):
result = regex.search(line)
if result:
ln = str(i)
pathnm = os.path.join(root,file1)
template = "\nLine: {0}\nFile: {1}\nString Type: {2}\n\n"
output = template.format(ln, pathnm, result.group())
print output
temp.write(output)
break
else:
print "String Not Found in: " + os.path.join(root,file1)
temp.write("\nString Not Found: " + os.path.join(root,file1) + "\n")
f.close()
re.purge()
elif searchType =='3':
print "\nDirectory to be searched: " + directory
print "\nFile result3.log will be created in: c:\Temp_log_files."
#directory = os.path.join("c:\\","SQA_log")
paths = "c:\\Temp_log_files\\result3.log"
temp = file(paths, "w")
regex = re.compile(r'(?:3\d){6}')
#with file('c:\\Temp_log_file\\result3.log', 'w') as logfile:
for root,dirname, files in os.walk(directory):
for afile in files:
if afile.endswith(".log") or afile.endswith(".txt"):
f=open(os.path.join(root,afile))
for i, line in enumerate(f):
searchedstr = regex.findall(line)
ln = str(i)
for word in searchedstr:
print "\nString found: " + word
print "Line: " + ln
print "File: " + os.path.join(root,afile)
print " "
#logfile = open('result3.log', 'a')
temp.write('\nLine:' + ln + '\nString:' + word + '\nFile: ' + os.path.join(root,afile) + '\n')
f.close()
re.purge()
def do_exit(self, line):
return True
if __name__ == '__main__':
SQAST().cmdloop()
-
\$\begingroup\$ There's something wrong with your indentation. This is not valid python code. \$\endgroup\$itsadok– itsadok2011年07月11日 12:45:34 +00:00Commented Jul 11, 2011 at 12:45
-
1\$\begingroup\$ Five level of nested blocks is a scary thing, too. \$\endgroup\$Evpok– Evpok2011年07月11日 12:51:17 +00:00Commented Jul 11, 2011 at 12:51
-
\$\begingroup\$ @itsadok - Maybe I added more or less spaces when adding and making adjustments to abide by code rules on this forum. However, I have no problems with indentation when running the code. \$\endgroup\$user706808– user7068082011年07月11日 12:53:17 +00:00Commented Jul 11, 2011 at 12:53
-
\$\begingroup\$ I always sort the result of os.walk(), since this makes your programm use the same ordering each time you run it. \$\endgroup\$guettli– guettli2011年07月11日 13:06:12 +00:00Commented Jul 11, 2011 at 13:06
-
3\$\begingroup\$ @user706808: "I have no problems with indentation when running the code". That's irrelevant. The code -- as posted in the question -- cannot possibly work. Please fix it rather than say that some other copy happens to work. We can't see the other copy, can we? \$\endgroup\$S.Lott– S.Lott2011年07月11日 16:44:34 +00:00Commented Jul 11, 2011 at 16:44
2 Answers 2
Your indentation is wrong in the posted code. Please edit your question to (a) paste the code then (b) select all the code and (c) click the
{}
button to get it all indented correctly.A single large class which does everything is a poor design. Each individual "command" should be a separate class.
Your overall command-line processing loop (
SQAST.cmdloop()
) should be a separate class which relies on instances of the other command classes. Each of the variousdo_this()
methods would then create an instance of one of the other classes, and then execute a method of that other class.Yes, it seems like a lot of extra classes. However, breaking the individual commands into separate classes allows you to reuse the commands in other applications without copying and pasting code. Also, making separate classes for each command makes the shared information among the commands absolutely explicit.
The stuff done at the class level inside the
SQAST
class (and outside any of the method functions) is often a Very Bad Idea. Use__init__
for this kind of thing.Try to avoid using
filename.endswith('.txt')
. It's slightly better to useos.splitext
to split the name and the extension and then compare the results.base, ext = os.path.splitext(filename)
. Yes, it's an extra line of code, but it makes your script slightly more portable to operating systems that might have peculiar file name rules (like Open VMS, for example).
Well... first thing I'd do right off is get out an axe (buzz-saw or some other violent tool) and hack this sucker into two programs. A log parser/hex-converter is just plain weird.
Once that's done, learn to use the python profiler tools, assuming you actually need to improve performance. That will give you some idea where your programs are spending all their time. (I'm assuming you did the chopping in half thing here. ;)
Here's a quick head-start. I wrote this little util when I had to profile a small ftp file gathering utility. Just copy the code and drop it into a file named ShowProf.py:
"""
Simple little harness to dump Python profiler data.
Basic usage:
python -m cProfile -o SomeProgram.prof SomeProgram.py -some_args_for_SomeProgram
ShowProf.py SomeProgram.prof | less
or for Windows folks who don't install unix commands on their systems...
ShowProf.py SomeProgram.prof | more
"""
import pstats
import sys
p = pstats.Stats(sys.argv[1])
p.strip_dirs().sort_stats(-1).print_stats()
Also, take a little time to work through the Python Tutorial in the help files that come with Python. And do a little module browsing. One of the common mistakes newcomers to Python make is to start reinventing the wheel. There are many very useful modules that make Python programming a breeze and a joy.
Explore related questions
See similar questions with these tags.