Command line tool for extracting, searching, and converting

Question 1

I just completed my first real application (command line app). It's simple, but I had no prior knowledge of Python. Everything was hit and miss, with many Python books and help from those in this community to guide me. Now, I am trying to see where I can refine my code and make things more efficient, as well as become a more efficient Python programmer. I would appreciate any feedback on what I could change, add or eliminate.

 #!c:\Python27
 """ This is a simple command line tool used to extract .tgz and .tar files
 and perform 3 different type of searches (single file w/ user string,
 multiple file w/ user string (converted into ascii, hex), and multiple
 file with re pattern) of .log and .txt files. It also performs ascii,
 hex, and octal conversion"""
 import cmd
 import os
 import sys, tarfile
 import time
 import datetime
 import re
 import string
 import fileinput
 import os.path
 class SQAST(cmd.Cmd):
 from time import strftime
 #os.system("clear")
 print strftime("%Y-%m-%d %H:%M:%S")
 print " *****************************"
 print " * Company Name *"
 print " *****************************"
 prompt = 'sqa@gilbarco >> '
 intro = " Security Tool " 
 doc_header = 'Attributes'
 misc_header = 'misc_header'
 undoc_header = 'Commands'
 ruler = '-'
 COMMANDS = ['ascii', 'files', 'help', 'hex', 'path', 'print', 'exit', 'search']
 def do_prompt(self, line):
 "Change the interactive prompt"
 self.prompt = line + ': '
 def do_hex(self, line):
 text=raw_input("Enter string to convert: ") #Add exception error handling
 print "HEX: " + text.encode('hex')
 def do_octal(self, line):
 text=raw_input("Enter string to convert: ") #Add exception error handling
 condint = re.compile(r'[0-9]+')
 if condint.match(text):
 convtint = int(text)
 print "OCTAL: " + oct(convtint)
 else:
 print "YOU ENTERED AN INVALID CHARACTER/s!"
 def do_ascii(self, line):
 text=raw_input("Enter string to convert: ").strip() #Add exception error handling
 print "ASCI: " + ' '.join(str(ord(char)) for char in text) 
 def do_files(self, line):
 path=os.getcwd() #Set log path here
 dirlist=os.listdir(path)
 for fname in dirlist:
 if fname.endswith(('.log','.txt')):
 print '\n'
 print fname
 def do_movefiles(self, line):
 defaultFolder = "Log_Text_Files"
 if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted \Log_Text_Files'):
 os.mkdir('C:\\Extracted\\Log_Text_Files')
 print 'ALL FILES WILL BE MOVED TO c:\Extracted\Log_Text_Files!\n'
 chPath = raw_input('\nEnter root folder to move all .log and .txt files from ... (eg. c:\Extracted\NameofFolder):\n')
 try:
 os.chdir(chPath)
 except:
 print "YOU ENTERED INVALID DATA OR AN INVALID PATH ... TRY AGAIN!!!"
 def print_tgzLogs (arg, dir, files):
 for file in files:
 path = os.path.join (dir, file)
 path = os.path.normcase (path)
 if path.endswith(".txt") or path.endswith(".log"):
 if os.path.exists('C:\\Extracted\\Log_Text_Files\\%s' % file):
 os.remove('C:\\Extracted\\Log_Text_Files\\%s' % file)
 os.rename(path, 'C:\\Extracted\\Log_Text_Files\\%s' % file)
 print path
 os.path.walk(os.getcwd(), print_tgzLogs, 0)
 def do_path(self, line):
 print "Current Path: " + os.getcwd()
 print " "
 print "(Eg. Windows path: c:\Python, Linux path: /var/opt/crindbios)"
 print " "
 changePath = raw_input("Enter new path: ")
 try:
 os.chdir(changePath)
 except:
 print "YOU ENTERED AN INVALID PATH ... exiting path!" 
 #os.chdir(changePath)
 print "Current Path: " + os.getcwd()
 def do_fileExtract(self, line):
 print "If a CRC error occurs, this usually means that the archived file or some parts thereof is corrupt!\n"
 defaultFolder = "Extracted"
 if not defaultFolder.endswith(':') and not os.path.exists('c:\\Extracted'):
 os.mkdir('c:\\Extracted')
 raw_input("PLACE .tgz FILES in c:\Extracted AT THIS TIME!!! PRESS ENTER WHEN FINISHED!")
 else:
 pass
 def extract(tar_url, extract_path='.'):
 print tar_url
 tar = tarfile.open(tar_url, 'r')
 for item in tar:
 tar.extract(item, extract_path)
 if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
 extract(item.name, "./" + item.name[:item.name.rfind('/')])
 userpath = "Extracted"
 directory = os.path.join("c:\\", userpath)
 os.chdir(directory)
 path=os.getcwd() #Set log path here
 dirlist=os.listdir(path)
 files = [fname for fname in os.listdir(path) 
 if fname.endswith(('.tgz','.tar'))]
 for item in enumerate(files):
 print "%d- %s" % item
 try:
 idx = int(raw_input("\nEnter the file's number:\n"))
 except ValueError:
 print "You fail at typing numbers."
 try:
 chosen = files[idx]
 except IndexError:
 print "Try a number in range next time."
 file_extensions = ('tar', 'tgz')
 # Edit this according to the archive types you want to extract. Keep in
 # mind that these should be extractable by the tarfile module.
 def FileExtension(file_name):
 """Return the file extension of file
 'file' should be a string. It can be either the full path of
 the file or just its name (or any string as long it contains
 the file extension.)
 Examples:
 input (file) --> 'abc.tar'
 return value --> 'tar'
 """
 match = re.compile(r"^.*[.](?P<ext>\w+)$",
 re.VERBOSE|re.IGNORECASE).match(file_name)
 if match: # if match != None:
 ext = match.group('ext')
 return ext
 else:
 return '' # there is no file extension to file_name
 def AppropriateFolderName(folder_name, parent_fullpath):
 """Return a folder name such that it can be safely created in
 parent_fullpath without replacing any existing folder in it.
 Check if a folder named folder_name exists in parent_fullpath. If no,
 return folder_name (without changing, because it can be safely created 
 without replacing any already existing folder). If yes, append an
 appropriate number to the folder_name such that this new folder_name
 can be safely created in the folder parent_fullpath.
 Examples:
 folder_name = 'untitled folder'
 return value = 'untitled folder' (if no such folder already exists
 in parent_fullpath.)
 folder_name = 'untitled folder'
 return value = 'untitled folder 1' (if a folder named 'untitled folder'
 already exists but no folder named
 'untitled folder 1' exists in
 parent_fullpath.)
 folder_name = 'untitled folder'
 return value = 'untitled folder 2' (if folders named 'untitled folder'
 and 'untitled folder 1' both
 already exist but no folder named
 'untitled folder 2' exists in
 parent_fullpath.)
 """
 if os.path.exists(os.path.join(parent_fullpath,folder_name)):
 match = re.compile(r'^(?P<name>.*)[ ](?P<num>\d+)$').match(folder_name)
 if match: # if match != None:
 name = match.group('name')
 number = match.group('num')
 new_folder_name = '%s %d' %(name, int(number)+1)
 return AppropriateFolderName(new_folder_name,
 parent_fullpath)
 # Recursively call itself so that it can be check whether a
 # folder named new_folder_name already exists in parent_fullpath
 # or not.
 else:
 new_folder_name = '%s 1' %folder_name
 return AppropriateFolderName(new_folder_name, parent_fullpath)
 # Recursively call itself so that it can be check whether a
 # folder named new_folder_name already exists in parent_fullpath
 # or not.
 else:
 return folder_name
 def Extract(tarfile_fullpath, delete_tar_file=True):
 """Extract the tarfile_fullpath to an appropriate* folder of the same
 name as the tar file (without an extension) and return the path
 of this folder.
 If delete_tar_file is True, it will delete the tar file after
 its extraction; if False, it won`t. Default value is True as you
 would normally want to delete the (nested) tar files after
 extraction. Pass a False, if you don`t want to delete the
 tar file (after its extraction) you are passing.
 """
 tarfile_name = os.path.basename(tarfile_fullpath)
 parent_dir = os.path.dirname(tarfile_fullpath)
 extract_folder_name = AppropriateFolderName(tarfile_name[:\
 -1*len(FileExtension(tarfile_name))-1], parent_dir)
 # (the slicing is to remove the extension (.tar) from the file name.)
 # Get a folder name (from the function AppropriateFolderName)
 # in which the contents of the tar file can be extracted,
 # so that it doesn't replace an already existing folder.
 extract_folder_fullpath = os.path.join(parent_dir,
 extract_folder_name)
 # The full path to this new folder.
 try:
 tar = tarfile.open(tarfile_fullpath)
 tar.extractall(extract_folder_fullpath)
 tar.close()
 if delete_tar_file:
 os.remove(tarfile_fullpath)
 return extract_folder_name
 except Exception as e:
 # Exceptions can occur while opening a damaged tar file.
 print 'Error occured while extracting %s\n'\
 'Reason: %s' %(tarfile_fullpath, e)
 return
 def WalkTreeAndExtract(parent_dir):
 """Recursively descend the directory tree rooted at parent_dir
 and extract each tar file on the way down (recursively).
 """
 try:
 dir_contents = os.listdir(parent_dir)
 except OSError as e:
 # Exception can occur if trying to open some folder whose
 # permissions this program does not have.
 print 'Error occured. Could not open folder %s\n'\
 'Reason: %s' %(parent_dir, e)
 return
 for content in dir_contents:
 content_fullpath = os.path.join(parent_dir, content)
 if os.path.isdir(content_fullpath):
 # If content is a folder, walk it down completely.
 WalkTreeAndExtract(content_fullpath)
 elif os.path.isfile(content_fullpath):
 # If content is a file, check if it is a tar file.
 # If so, extract its contents to a new folder.
 if FileExtension(content_fullpath) in file_extensions:
 extract_folder_name = Extract(content_fullpath)
 if extract_folder_name: # if extract_folder_name != None:
 dir_contents.append(extract_folder_name)
 # Append the newly extracted folder to dir_contents
 # so that it can be later searched for more tar files
 # to extract.
 else:
 # Unknown file type.
 print 'Skipping %s. <Neither file nor folder>' % content_fullpath
 if __name__ == '__main__':
 tarfile_fullpath = chosen # pass the path of your tar file here.
 extract_folder_name = Extract(tarfile_fullpath, False)
 # tarfile_fullpath is extracted to extract_folder_name. Now descend
 # down its directory structure and extract all other tar files
 # (recursively).
 extract_folder_fullpath = os.path.join(os.path.dirname(tarfile_fullpath),
 extract_folder_name)
 WalkTreeAndExtract(extract_folder_fullpath)
 # If you want to extract all tar files in a dir, just execute the above
 # line and nothing else.
 """try:
 extract(chosen)
 print 'Done'
 except:
 print ' '"""
 def complete_files(self, text, line, begidx, endix):
 if not text:
 completions = self.COMMANDS[:]
 else:
 completions = [f
 for f in self.COMMANDS
 if f.startswith(text)]
 return completions
 def do_search(self, line):
 print '\nCurrent dir: '+ os.getcwd()
 print '\nALL LOG FILES WILL BE CREATED IN: c:\Temp_log_files!'
 userpath = raw_input("\nPlease enter a path to search (only enter folder name, eg. SQA\log): ")
 try:
 directory = os.path.join("c:\\",userpath)
 os.chdir(directory)
 except:
 print "Path name does not exist!!!"
 print "\n SEARCHES ARE CASE SENSITIVE"
 print " "
 line = "[1]Single File [2]Multiple Files [3]STATIC HEX"
 col1 = line[0:14]
 col2 = line[15:32]
 col3 = line[33:46]
 print " " + col1 + " " + col2 + " " + col3
 print '\n'
 print "\nCurrent Dir: " + os.getcwd()
 searchType = raw_input("\nSelect type of search: ")
 if searchType == '1':
 print "\nFile result1.log will be ceated in c:\Temp_log_files."
 print "\nFILES:\n"
 #self.do_files(89)
 files = [fname for fname in os.listdir(os.getcwd()) 
 if fname.endswith(('.txt','.log'))]
 for item in enumerate(files):
 print "%d- %s" % item
 try:
 idx = int(raw_input("\nEnter the file's number:\n"))
 except ValueError:
 print "You fail at typing numbers."
 try:
 chosen = files[idx]
 except IndexError:
 print "Try a number in range next time."
 paths = "c:\\Temp_log_files\\result1.log"
 temp = file(paths, "w")
 #logfile = raw_input("\nEnter filename to search (eg. name.log): ")
 logfile = chosen
 try: 
 fiLe = open(logfile, "r")
 except:
 print "Filename does not exist!!!"
 userString = raw_input("\nEnter a string name to search: ")
 for i,line in enumerate(fiLe.readlines()):
 if userString in line:
 ln = str(i)
 print "String: " + userString
 print "File: " + os.path.join(directory,logfile)
 print "Line: " + ln
 temp.write('\nLine:' + ln + '\nString:' + userString + '\nFile: ' + os.path.join(directory,logfile) + '\n')
 break
 else:
 print "%s NOT in %s" % (userString, logfile)
 fiLe.close()
 elif searchType =='2':
 print "\nDirectory to be searched: " + directory
 print "\nFile result2.log will be created in: c:\Temp_log_files."
 paths = "c:\\Temp_log_files\\result2.log"
 temp = file(paths, "w")
 userstring = raw_input("Enter a string name to search: ")
 userStrHEX = userstring.encode('hex')
 userStrASCII = ''.join(str(ord(char)) for char in userstring)
 regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape( userStrHEX ), re.escape( userStrASCII )))
 goby = raw_input("Press Enter to begin search (search ignores whitespace)!\n")
 for root,dirname, files in os.walk(directory):
 for file1 in files:
 if file1.endswith(".log") or file1.endswith(".txt"):
 f=open(os.path.join(root, file1))
 for i,line in enumerate(f.readlines()):
 result = regex.search(line)
 if result:
 ln = str(i)
 pathnm = os.path.join(root,file1)
 template = "\nLine: {0}\nFile: {1}\nString Type: {2}\n\n"
 output = template.format(ln, pathnm, result.group())
 print output
 temp.write(output)
 break 
 else:
 print "String Not Found in: " + os.path.join(root,file1)
 temp.write("\nString Not Found: " + os.path.join(root,file1) + "\n")
 f.close()
 re.purge()
 elif searchType =='3':
 print "\nDirectory to be searched: " + directory
 print "\nFile result3.log will be created in: c:\Temp_log_files."
 #directory = os.path.join("c:\\","SQA_log")
 paths = "c:\\Temp_log_files\\result3.log"
 temp = file(paths, "w")
 regex = re.compile(r'(?:3\d){6}')
 #with file('c:\\Temp_log_file\\result3.log', 'w') as logfile:
 for root,dirname, files in os.walk(directory):
 for afile in files:
 if afile.endswith(".log") or afile.endswith(".txt"):
 f=open(os.path.join(root,afile))
 for i, line in enumerate(f):
 searchedstr = regex.findall(line)
 ln = str(i)
 for word in searchedstr:
 print "\nString found: " + word
 print "Line: " + ln
 print "File: " + os.path.join(root,afile)
 print " "
 #logfile = open('result3.log', 'a')
 temp.write('\nLine:' + ln + '\nString:' + word + '\nFile: ' + os.path.join(root,afile) + '\n')
 f.close()
 re.purge() 
def do_exit(self, line):
 return True
if __name__ == '__main__':
 SQAST().cmdloop()

Question 2

There's something wrong with your indentation. This is not valid python code.

Question 3

Five level of nested blocks is a scary thing, too.

Question 4

@itsadok - Maybe I added more or less spaces when adding and making adjustments to abide by code rules on this forum. However, I have no problems with indentation when running the code.

Question 5

I always sort the result of os.walk(), since this makes your programm use the same ordering each time you run it.

Question 6

@user706808: "I have no problems with indentation when running the code". That's irrelevant. The code -- as posted in the question -- cannot possibly work. Please fix it rather than say that some other copy happens to work. We can't see the other copy, can we?

Question 7

Your indentation is wrong in the posted code. Please edit your question to (a) paste the code then (b) select all the code and (c) click the {} button to get it all indented correctly.
A single large class which does everything is a poor design. Each individual "command" should be a separate class.
Your overall command-line processing loop (SQAST.cmdloop()) should be a separate class which relies on instances of the other command classes. Each of the various do_this() methods would then create an instance of one of the other classes, and then execute a method of that other class.

Yes, it seems like a lot of extra classes. However, breaking the individual commands into separate classes allows you to reuse the commands in other applications without copying and pasting code. Also, making separate classes for each command makes the shared information among the commands absolutely explicit.
The stuff done at the class level inside the SQAST class (and outside any of the method functions) is often a Very Bad Idea. Use __init__ for this kind of thing.
Try to avoid using filename.endswith('.txt'). It's slightly better to use os.splitext to split the name and the extension and then compare the results. base, ext = os.path.splitext(filename). Yes, it's an extra line of code, but it makes your script slightly more portable to operating systems that might have peculiar file name rules (like Open VMS, for example).

Question 8

Well... first thing I'd do right off is get out an axe (buzz-saw or some other violent tool) and hack this sucker into two programs. A log parser/hex-converter is just plain weird.

Once that's done, learn to use the python profiler tools, assuming you actually need to improve performance. That will give you some idea where your programs are spending all their time. (I'm assuming you did the chopping in half thing here. ;)

Here's a quick head-start. I wrote this little util when I had to profile a small ftp file gathering utility. Just copy the code and drop it into a file named ShowProf.py:

"""
Simple little harness to dump Python profiler data.
Basic usage:
 python -m cProfile -o SomeProgram.prof SomeProgram.py -some_args_for_SomeProgram
 ShowProf.py SomeProgram.prof | less
or for Windows folks who don't install unix commands on their systems...
 ShowProf.py SomeProgram.prof | more
"""
import pstats
import sys
p = pstats.Stats(sys.argv[1])
p.strip_dirs().sort_stats(-1).print_stats()

Also, take a little time to work through the Python Tutorial in the help files that come with Python. And do a little module browsing. One of the common mistakes newcomers to Python make is to start reinventing the wheel. There are many very useful modules that make Python programming a breeze and a joy.

S.Lott S.LottS.Lott 7803 silver badges6 bronze badges · Accepted Answer · 2011-07-11 14:08:25Z

Your indentation is wrong in the posted code. Please edit your question to (a) paste the code then (b) select all the code and (c) click the {} button to get it all indented correctly.
A single large class which does everything is a poor design. Each individual "command" should be a separate class.
Your overall command-line processing loop (SQAST.cmdloop()) should be a separate class which relies on instances of the other command classes. Each of the various do_this() methods would then create an instance of one of the other classes, and then execute a method of that other class.

Yes, it seems like a lot of extra classes. However, breaking the individual commands into separate classes allows you to reuse the commands in other applications without copying and pasting code. Also, making separate classes for each command makes the shared information among the commands absolutely explicit.
The stuff done at the class level inside the SQAST class (and outside any of the method functions) is often a Very Bad Idea. Use __init__ for this kind of thing.
Try to avoid using filename.endswith('.txt'). It's slightly better to use os.splitext to split the name and the extension and then compare the results. base, ext = os.path.splitext(filename). Yes, it's an extra line of code, but it makes your script slightly more portable to operating systems that might have peculiar file name rules (like Open VMS, for example).

Stack Exchange Network

Command line tool for extracting, searching, and converting

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Command line tool for extracting, searching, and converting

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions