Filtering log files from a server

Question 1

I have made a program in Python to do a filter of log files from a server. What I would like is some guidance on how this code could be optimized, streamlined or 'short-handed' so my 'pythonic' programming becomes better. I don't program for work, I do it on a requirement basis so I'm not good by any means.

Requirement:

The code I have has functions. I would like to know, in regards to OOP, how I should best use this in a class object.
I would also like to know if you have any other alternatives to the coding I have used (like the libraries, any sorting or filtering algorithms etc.).

I need assistance with:

Optimizing in both coding and performance efficiency.
Advice on how this would turn into a class object.
Any better solution to how I've done it.

I am using Tkinter for a GUI which is where the folder path is retrieved.

#==================================================================
### OS and python based Imports
import os
### Office, Excel and other file imports
import glob, csv, xlwt, operator
### GUI based imports
import errno, Tkinter, tkFileDialog
from Tkinter import *
#==================================================================
'''Define Command Functions'''
# ======== Select a directory:
def folder_select():
 fname = tkFileDialog.askdirectory(parent=root,initialdir="/",title='Please select a directory') 
 if not fname == "": 
 folderPath.delete(0, END) ## This is to erase anything in the "browse" bar
 folderPath.insert(0, fname) ## and use what has been selected from the selection window
 else: 
 folderPath.insert(0, fname) ## This just inserts what has been selected from the selection window
# ======== Make Directories
def makedirs(d):
 print("Making Directory")
 try:
 os.makedirs(d)
 except OSError as e:
 # If the file already exists, and is a directory
 if e.errno == errno.EEXIST and os.path.isdir(d):
 created = False
 # It's some other error, or the existing file is not a directory
 else:
 raise
 else:
 created = True
 return created
# ======= try find file names that end with the passed through extension
def get_valid_filenames(directory, extensions):
 print("Finding "+extensions+" files...")
 for filename in os.listdir(directory):
 if filename.lower().endswith(extensions):
 yield filename
# ====== Get rid of the crap the email uses
def remove_garbage(to_dir, fileToSearch, fname): 
 print("Removing email garbage from "+fname)
 xDate = "" ## This is to see if there is a file with this date already
 for line in fileToSearch:
 delimiter = line.replace(" - ", ',') ## Replace all " - " strings with "," for csv standards
 date = delimiter[:10] ## Make the date be any text of the first 10 characters
 time = delimiter[11:] ## make the time be the rest of them after 11 (because there is a " " between them
 # The original date is mm/dd/yyyy and it's changing to dd-mm-yyyy
 day = date[3:5]
 month = date[:2]
 year = date[6:]
 date = day+"-"+month+"-"+year
 delimiter = date+","+time ## now make the delimited values match csv standards
 to_file = os.path.join(to_dir, date+".csv") ## Create the csv's based on dates 
 if line.find('\x00') > 0: ## Avoid any null characters for the sort function
 delimiter = line.replace('\x00', '')
 if not xDate == date: ## this is where we find out if there is a previously used date matches the current date
 if not os.path.exists(to_file): ## if the file doesn't exist create it
 fileToCreate = open(to_file, 'w')
 fileToCreate.writelines(delimiter)
 xDate = date
 else: ## if it does exist append to it
 fileToCreate = open(to_file, 'a+')
 fileToCreate.writelines(delimiter)
 xDate = date
 else: ## if the previous date does match the current date - the file is open so just add to it
 fileToCreate.writelines(delimiter)
# ======= I want to sort by time 
def sort_by_columns(to_dir):
 said_it = False
 for the_file in get_valid_filenames(to_dir, ".csv"):
 filename = os.path.join(to_dir, the_file)
 read_csv_file = csv.reader(open(filename, 'rb'))
 if said_it == False:
 print("Sorting CSV files by time")
 said_it = True
 sort = sorted(read_csv_file,key=operator.itemgetter(1)) ## This sorts the file
 with open(filename,'w') as f: ## This writes it back
 writer = csv.writer(f)
 writer.writerows(sort)
 print("Making Excel Workbook") 
 make_workbook(to_dir) ## while all this is in progress add all the data to the workbook
#======= Create XLS workbook with a new sheet for each day of the logs
def make_workbook(to_dir):
 wb = xlwt.Workbook()
 csv_files = os.path.join(to_dir, "*.csv")
 for filename in glob.glob(csv_files): ## Break everything down so we can use variables for names that relate the the file
 (f_path, f_name) = os.path.split(filename)
 (f_short_name, f_extension) = os.path.splitext(f_name)
 ws = wb.add_sheet(f_short_name)
 csvReader = csv.reader(open(filename, 'rb'))
 for rowx, row in enumerate(csvReader):
 for colx, value in enumerate(row):
 cwidth = ws.col(colx).width
 if (len(value)*300) > cwidth: 
 ws.col(colx).width = (len(value)*300) ##(Modify column width to match biggest data in that column)
 ws.write(rowx, colx, value)
 final_workbook = os.path.join(to_dir, "logs.xls") 
 wb.save(final_workbook)
 print("Excel workbook saved.")
# ======= clean up the remaining junk
def clean_up_files(to_dir):
 print("Cleaning Repository")
 for fname in os.listdir(to_dir):
 files_left = os.path.join(to_dir, fname)
 if not fname.lower().endswith(".xls"):
 try:
 os.remove(files_left) ## Remove all remaining files that were created by this script
 except OSError, e: ## if failed, report it back to the user
 pass 
#======= this is the main function that calls the rest 
def confirm_it(directory):
 count = 0
 to_dir = os.path.join(directory, "Compiled")
 for fname in get_valid_filenames(directory, ".txt"):
 logs = os.path.join(directory, fname) ## where logs are stored
 logFile = open(logs).readlines() ## allow them to be modified 
 # This just recompiles a list of logs based on dates
 for line in logFile:
 dates = re.compile(r'\d\d\/\w\w\/')
 if dates.findall(line): 
 if not os.path.exists(to_dir):
 makedirs(to_dir)
 search_dir = os.path.join(to_dir, "logs"+str(count))
 fileToSearch = open(search_dir, 'a+').writelines(line)
 #======= Creating CSVs with proper entries
 fileToSearch = open(search_dir, 'r')
 remove_garbage(to_dir, fileToSearch, fname)
 count = count+1
 fileToSearch.close()
 sort_by_columns(to_dir)
 clean_up_files(to_dir)
 print("\n\n==== DONE!!!! ====")
def confirm_it_wrapper():
 confirm_it(directory=folderPath.get())

Log Source:

I imagine you would need some source of the logs for it to work. The easiest source information would be:

03/28/2014 20:52:33 - 12 - logs - error sending log report via email

Just copy that a few times into a text file and change the time to see what it does. You could also make multiple text files with multiple days (remember the date is in American format - mm/dd/yyyy).

Please remember this is just for me to improve my style of coding. I have the code working completely and it goes as fast as it can go (I would imaging). I just want to know what methods I can use to improve my coding style.

Question 2

Raising useful errors

If you have something like this:

raise

You're doing something wrong. Just calling raise with no additional useful info is not helpful. Preferably, you should do one of the following:

Raise a built-in error:
```
raise BuiltInError
```
Raise a built-in error, with an error message:
```
raise BuiltInError("Blah foobar")
```
Define your own custom error, and raise it with a custom message:
```
class MyCustomError(Exception): pass
raise MyCustomError("fooblah bar")
```
I don't recommend this though, as chances are, you can use a built-in error, rather than needing to define your own.

This will add quite a bit of clarity to the user of your program. Instead of the program just crashing, it will give them a useful message, telling them what went wrong.

Properly opening files

Just opening a file like this:

f = open("path/to/file", "mode")

And calling f.close, is not how this should be done. file.close doesn't properly close a file, and since opened files are unmanaged resources, you generally want to close files properly once you're done using them.

To do this, you need to use a context manager with the with statement, like this:

with open("path/to/file", "mode") as f:
 ...

This eliminates the need to call file.close, and the file is properly closed. This means that resources are properly freed.

Style

Please. Never use "separator comments" like this:

#==================================================================

It's a horrible code smell, and it's completely unnecessary. If you separate your top-level code/functions/classes with two blank lines, you're fine.

Your variable naming is also very inconsistent as well.

Variable names should be in snake_case.
Constants should be in UPPER_SNAKE_CASE.
Functions should also be in snake_case.
Classes should be in PascalCase.

You should also have whitespace between mathematical, binary, comparison, and assignment operators. This will add additional clarity, and readability to your code.

It's also best practice to avoid wildcard imports like this:

from Tkinter import *

I'd recommend that you read PEP8, Python's official style guide.

Docstrings

You should also be using docstrings. This means that a comment above a function, like this:

# ======== Make Directories

Should become a docstring. A typical docstring usually has a structure like this:

def my_func( ... ):
 """Small description of the function.
 Longer, more detailed description of the function.
 Keyword arguments:
 argument -- Argument description.
 """
 ...

Docstrings can also be used on classes, and at the top of a code file.

Question 3

Actually calling raise after an exception was caught will re-raise the original exception as is. OP is using it correctly here by trying to check for a particular exception to handle it differently, and if it's not found allowing the original exception to raise.

Question 4

Also, the exception base class is called Exception, not BuiltInError. (edit: now I got what you meant, but apparently, it is misleading)

Question 5

file.close does properly close a file. The advantage of using a context manager (with) is that the file is guaranteed to be closed even if an exception occurs.

Question 6

@mkrieger1 If it doesn't close the file on an error, then it means it doesn't properly close the file. Just on an error.

Question 7

Duplicated code:

if not fname == "": 
 folderPath.delete(0, END) ## This is to erase anything in the "browse" bar
 folderPath.insert(0, fname) ## and use what has been selected from the selection window
else: 
 folderPath.insert(0, fname)

You have folderPath.insert(0, fname) in both the if and the else, at the end of each. Remove duplicated code like this and place it after the if/else:

if not fname == "": 
 folderPath.delete(0, END) ## This is to erase anything in the "browse" bar
 
folderPath.insert(0, fname)

PEP8

Your code does not follow PEP8 standards. This is the recommended style guideline for Python, and most Python programmers follow it.

Naming:

d is not a descriptive variable name (def makedirs(d):), not is f (with open(filename,'w') as f:). You use good names in most other places, so please be consistent.

Comparison to Boolean Literals:

You do not need to compare a value to False or True: if said_it == False:. Just use if not said_if:. Again, you do this in other places in your code.

Addition:

count = count+1 can be count += 1.

Question 8

Yea, I guess the problem is the coding languages I was taught were like C++ where everything is elongated so I switch back and forth between what I did and how pythonic style works

score 10 · Accepted Answer · 2015-09-10 01:55:22Z

Raising useful errors

If you have something like this:

raise

You're doing something wrong. Just calling raise with no additional useful info is not helpful. Preferably, you should do one of the following:

Raise a built-in error:
```
raise BuiltInError
```
Raise a built-in error, with an error message:
```
raise BuiltInError("Blah foobar")
```
Define your own custom error, and raise it with a custom message:
```
class MyCustomError(Exception): pass
raise MyCustomError("fooblah bar")
```
I don't recommend this though, as chances are, you can use a built-in error, rather than needing to define your own.

This will add quite a bit of clarity to the user of your program. Instead of the program just crashing, it will give them a useful message, telling them what went wrong.

Properly opening files

Just opening a file like this:

f = open("path/to/file", "mode")

And calling f.close, is not how this should be done. file.close doesn't properly close a file, and since opened files are unmanaged resources, you generally want to close files properly once you're done using them.

To do this, you need to use a context manager with the with statement, like this:

with open("path/to/file", "mode") as f:
 ...

This eliminates the need to call file.close, and the file is properly closed. This means that resources are properly freed.

Style

Please. Never use "separator comments" like this:

#==================================================================

It's a horrible code smell, and it's completely unnecessary. If you separate your top-level code/functions/classes with two blank lines, you're fine.

Your variable naming is also very inconsistent as well.

Variable names should be in snake_case.
Constants should be in UPPER_SNAKE_CASE.
Functions should also be in snake_case.
Classes should be in PascalCase.

You should also have whitespace between mathematical, binary, comparison, and assignment operators. This will add additional clarity, and readability to your code.

It's also best practice to avoid wildcard imports like this:

from Tkinter import *

I'd recommend that you read PEP8, Python's official style guide.

Docstrings

You should also be using docstrings. This means that a comment above a function, like this:

# ======== Make Directories

Should become a docstring. A typical docstring usually has a structure like this:

def my_func( ... ):
 """Small description of the function.
 Longer, more detailed description of the function.
 Keyword arguments:
 argument -- Argument description.
 """
 ...

Docstrings can also be used on classes, and at the top of a code file.

Actually calling raise after an exception was caught will re-raise the original exception as is. OP is using it correctly here by trying to check for a particular exception to handle it differently, and if it's not found allowing the original exception to raise.
Also, the exception base class is called Exception, not BuiltInError. (edit: now I got what you meant, but apparently, it is misleading)
file.close does properly close a file. The advantage of using a context manager (with) is that the file is guaranteed to be closed even if an exception occurs.
@mkrieger1 If it doesn't close the file on an error, then it means it doesn't properly close the file. Just on an error.

Stack Exchange Network

Filtering log files from a server

2 Answers 2

Raising useful errors

Properly opening files

Style

Docstrings

Duplicated code:

PEP8

Naming:

Comparison to Boolean Literals:

Addition:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Filtering log files from a server

2 Answers 2

Raising useful errors

Properly opening files

Style

Docstrings

Duplicated code:

PEP8

Naming:

Comparison to Boolean Literals:

Addition:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions