I have made a program in Python to do a filter of log files from a server. What I would like is some guidance on how this code could be optimized, streamlined or 'short-handed' so my 'pythonic' programming becomes better. I don't program for work, I do it on a requirement basis so I'm not good by any means.
Requirement:
- The code I have has functions. I would like to know, in regards to OOP, how I should best use this in a class object.
- I would also like to know if you have any other alternatives to the coding I have used (like the libraries, any sorting or filtering algorithms etc.).
I need assistance with:
- Optimizing in both coding and performance efficiency.
- Advice on how this would turn into a class object.
- Any better solution to how I've done it.
I am using Tkinter for a GUI which is where the folder path is retrieved.
#==================================================================
### OS and python based Imports
import os
### Office, Excel and other file imports
import glob, csv, xlwt, operator
### GUI based imports
import errno, Tkinter, tkFileDialog
from Tkinter import *
#==================================================================
'''Define Command Functions'''
# ======== Select a directory:
def folder_select():
fname = tkFileDialog.askdirectory(parent=root,initialdir="/",title='Please select a directory')
if not fname == "":
folderPath.delete(0, END) ## This is to erase anything in the "browse" bar
folderPath.insert(0, fname) ## and use what has been selected from the selection window
else:
folderPath.insert(0, fname) ## This just inserts what has been selected from the selection window
# ======== Make Directories
def makedirs(d):
print("Making Directory")
try:
os.makedirs(d)
except OSError as e:
# If the file already exists, and is a directory
if e.errno == errno.EEXIST and os.path.isdir(d):
created = False
# It's some other error, or the existing file is not a directory
else:
raise
else:
created = True
return created
# ======= try find file names that end with the passed through extension
def get_valid_filenames(directory, extensions):
print("Finding "+extensions+" files...")
for filename in os.listdir(directory):
if filename.lower().endswith(extensions):
yield filename
# ====== Get rid of the crap the email uses
def remove_garbage(to_dir, fileToSearch, fname):
print("Removing email garbage from "+fname)
xDate = "" ## This is to see if there is a file with this date already
for line in fileToSearch:
delimiter = line.replace(" - ", ',') ## Replace all " - " strings with "," for csv standards
date = delimiter[:10] ## Make the date be any text of the first 10 characters
time = delimiter[11:] ## make the time be the rest of them after 11 (because there is a " " between them
# The original date is mm/dd/yyyy and it's changing to dd-mm-yyyy
day = date[3:5]
month = date[:2]
year = date[6:]
date = day+"-"+month+"-"+year
delimiter = date+","+time ## now make the delimited values match csv standards
to_file = os.path.join(to_dir, date+".csv") ## Create the csv's based on dates
if line.find('\x00') > 0: ## Avoid any null characters for the sort function
delimiter = line.replace('\x00', '')
if not xDate == date: ## this is where we find out if there is a previously used date matches the current date
if not os.path.exists(to_file): ## if the file doesn't exist create it
fileToCreate = open(to_file, 'w')
fileToCreate.writelines(delimiter)
xDate = date
else: ## if it does exist append to it
fileToCreate = open(to_file, 'a+')
fileToCreate.writelines(delimiter)
xDate = date
else: ## if the previous date does match the current date - the file is open so just add to it
fileToCreate.writelines(delimiter)
# ======= I want to sort by time
def sort_by_columns(to_dir):
said_it = False
for the_file in get_valid_filenames(to_dir, ".csv"):
filename = os.path.join(to_dir, the_file)
read_csv_file = csv.reader(open(filename, 'rb'))
if said_it == False:
print("Sorting CSV files by time")
said_it = True
sort = sorted(read_csv_file,key=operator.itemgetter(1)) ## This sorts the file
with open(filename,'w') as f: ## This writes it back
writer = csv.writer(f)
writer.writerows(sort)
print("Making Excel Workbook")
make_workbook(to_dir) ## while all this is in progress add all the data to the workbook
#======= Create XLS workbook with a new sheet for each day of the logs
def make_workbook(to_dir):
wb = xlwt.Workbook()
csv_files = os.path.join(to_dir, "*.csv")
for filename in glob.glob(csv_files): ## Break everything down so we can use variables for names that relate the the file
(f_path, f_name) = os.path.split(filename)
(f_short_name, f_extension) = os.path.splitext(f_name)
ws = wb.add_sheet(f_short_name)
csvReader = csv.reader(open(filename, 'rb'))
for rowx, row in enumerate(csvReader):
for colx, value in enumerate(row):
cwidth = ws.col(colx).width
if (len(value)*300) > cwidth:
ws.col(colx).width = (len(value)*300) ##(Modify column width to match biggest data in that column)
ws.write(rowx, colx, value)
final_workbook = os.path.join(to_dir, "logs.xls")
wb.save(final_workbook)
print("Excel workbook saved.")
# ======= clean up the remaining junk
def clean_up_files(to_dir):
print("Cleaning Repository")
for fname in os.listdir(to_dir):
files_left = os.path.join(to_dir, fname)
if not fname.lower().endswith(".xls"):
try:
os.remove(files_left) ## Remove all remaining files that were created by this script
except OSError, e: ## if failed, report it back to the user
pass
#======= this is the main function that calls the rest
def confirm_it(directory):
count = 0
to_dir = os.path.join(directory, "Compiled")
for fname in get_valid_filenames(directory, ".txt"):
logs = os.path.join(directory, fname) ## where logs are stored
logFile = open(logs).readlines() ## allow them to be modified
# This just recompiles a list of logs based on dates
for line in logFile:
dates = re.compile(r'\d\d\/\w\w\/')
if dates.findall(line):
if not os.path.exists(to_dir):
makedirs(to_dir)
search_dir = os.path.join(to_dir, "logs"+str(count))
fileToSearch = open(search_dir, 'a+').writelines(line)
#======= Creating CSVs with proper entries
fileToSearch = open(search_dir, 'r')
remove_garbage(to_dir, fileToSearch, fname)
count = count+1
fileToSearch.close()
sort_by_columns(to_dir)
clean_up_files(to_dir)
print("\n\n==== DONE!!!! ====")
def confirm_it_wrapper():
confirm_it(directory=folderPath.get())
Log Source:
I imagine you would need some source of the logs for it to work. The easiest source information would be:
03/28/2014 20:52:33 - 12 - logs - error sending log report via email
Just copy that a few times into a text file and change the time to see what it does. You could also make multiple text files with multiple days (remember the date is in American format - mm/dd/yyyy).
Please remember this is just for me to improve my style of coding. I have the code working completely and it goes as fast as it can go (I would imaging). I just want to know what methods I can use to improve my coding style.
2 Answers 2
Raising useful errors
If you have something like this:
raise
You're doing something wrong. Just calling raise
with no additional useful info is not helpful. Preferably, you should do one of the following:
Raise a built-in error:
raise BuiltInError
Raise a built-in error, with an error message:
raise BuiltInError("Blah foobar")
Define your own custom error, and raise it with a custom message:
class MyCustomError(Exception): pass raise MyCustomError("fooblah bar")
I don't recommend this though, as chances are, you can use a built-in error, rather than needing to define your own.
This will add quite a bit of clarity to the user of your program. Instead of the program just crashing, it will give them a useful message, telling them what went wrong.
Properly opening files
Just opening a file like this:
f = open("path/to/file", "mode")
And calling f.close
, is not how this should be done. file.close
doesn't properly close a file, and since opened files are unmanaged resources, you generally want to close files properly once you're done using them.
To do this, you need to use a context manager with the with
statement, like this:
with open("path/to/file", "mode") as f:
...
This eliminates the need to call file.close
, and the file is properly closed. This means that resources are properly freed.
Style
Please. Never use "separator comments" like this:
#==================================================================
It's a horrible code smell, and it's completely unnecessary. If you separate your top-level code/functions/classes with two blank lines, you're fine.
Your variable naming is also very inconsistent as well.
- Variable names should be in
snake_case
. - Constants should be in
UPPER_SNAKE_CASE
. - Functions should also be in
snake_case
. - Classes should be in
PascalCase
.
You should also have whitespace between mathematical, binary, comparison, and assignment operators. This will add additional clarity, and readability to your code.
It's also best practice to avoid wildcard imports like this:
from Tkinter import *
I'd recommend that you read PEP8, Python's official style guide.
Docstrings
You should also be using docstrings. This means that a comment above a function, like this:
# ======== Make Directories
Should become a docstring. A typical docstring usually has a structure like this:
def my_func( ... ):
"""Small description of the function.
Longer, more detailed description of the function.
Keyword arguments:
argument -- Argument description.
"""
...
Docstrings can also be used on classes, and at the top of a code file.
-
2\$\begingroup\$ Actually calling
raise
after an exception was caught will re-raise the original exception as is. OP is using it correctly here by trying to check for a particular exception to handle it differently, and if it's not found allowing the original exception to raise. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年09月10日 09:43:48 +00:00Commented Sep 10, 2015 at 9:43 -
\$\begingroup\$ Also, the exception base class is called
Exception
, notBuiltInError
. (edit: now I got what you meant, but apparently, it is misleading) \$\endgroup\$mkrieger1– mkrieger12015年09月10日 13:03:01 +00:00Commented Sep 10, 2015 at 13:03 -
\$\begingroup\$
file.close
does properly close a file. The advantage of using a context manager (with
) is that the file is guaranteed to be closed even if an exception occurs. \$\endgroup\$mkrieger1– mkrieger12015年09月10日 13:05:23 +00:00Commented Sep 10, 2015 at 13:05 -
\$\begingroup\$ @mkrieger1 If it doesn't close the file on an error, then it means it doesn't properly close the file. Just on an error. \$\endgroup\$Ethan Bierlein– Ethan Bierlein2015年09月10日 13:54:19 +00:00Commented Sep 10, 2015 at 13:54
Duplicated code:
if not fname == "":
folderPath.delete(0, END) ## This is to erase anything in the "browse" bar
folderPath.insert(0, fname) ## and use what has been selected from the selection window
else:
folderPath.insert(0, fname)
You have folderPath.insert(0, fname)
in both the if
and the else
, at the end of each. Remove duplicated code like this and place it after the if
/else
:
if not fname == "":
folderPath.delete(0, END) ## This is to erase anything in the "browse" bar
folderPath.insert(0, fname)
PEP8
Your code does not follow PEP8 standards. This is the recommended style guideline for Python, and most Python programmers follow it.
Naming:
d
is not a descriptive variable name (def makedirs(d):
), not is f
(with open(filename,'w') as f:
). You use good names in most other places, so please be consistent.
Comparison to Boolean Literals:
You do not need to compare a value to False
or True
: if said_it == False:
. Just use if not said_if:
. Again, you do this in other places in your code.
Addition:
count = count+1
can be count += 1
.
-
\$\begingroup\$ Yea, I guess the problem is the coding languages I was taught were like C++ where everything is elongated so I switch back and forth between what I did and how pythonic style works \$\endgroup\$CFNZ_Techie– CFNZ_Techie2015年09月10日 02:37:28 +00:00Commented Sep 10, 2015 at 2:37
Explore related questions
See similar questions with these tags.