I implemented a URL shortener with Python and a database. Please review the database part of the code. The idea is to create an MD5 hash of the URL in question and then store it in a database. The hex digits of the MD5 hash is then encoded with a large base encoding, i.e. base 62 encoding or base 128 encoding to do away with the long length of the hash. The rest of the code can be found on https://github.com/hirok19/URLShortener.
import pysqlite3
import hashlib
class DatabaseConnector:
cursor=None
conn=None
def __init__(self,path):
try:
self.conn = pysqlite3.connect(path)
self.cursor=self.conn.cursor()
except:
print("Unable to initialize database")
def createTable(self):
try:
self.cursor.execute("""CREATE TABLE IF NOT EXISTS urlTable (id,url)""")
except Exception as e:
print("Unable to create table")
print(e)
def insertURL(self,URL):
key=self.getKey(URL)
#print(type(key))
insert_stmt = ("INSERT INTO urlTable (id,url) VALUES ('{0}','{1}');")
statement=insert_stmt.format(key,URL)
#print(statement)
try:
if(self.getURLFromKey(key) is None):
self.cursor.execute(statement)
self.conn.commit()
print("Data inserted!")
else:
print("Data already present")
except Exception as e:
print("Unable to insert data")
print(e)
return None
return key
def getKey(self,URL):
hashedKey=hashlib.md5(URL.encode())
return hashedKey.hexdigest()
def getURLFromKey(self,key):
insert_stmt=("SELECT url from urlTable where id='{0}';")
statement=insert_stmt.format(key)
#print(statement)
try:
self.cursor.execute(statement)
rows = self.cursor.fetchall()
if(len(rows)>0):
return rows[0][0]
else:
return None
except Exception as e:
print("Unable to fetch URL")
print(e)
return None
2 Answers 2
Documentation
The PEP 8 style guide recommends adding docstrings for classes and functions. For example, the docstring for the class should summarize its purpose:
class DatabaseConnector:
"""
URL shortener with a database.
Create an MD5 hash of the URL and then store it in a database.
"""
Comments
Remove all commented-out code to reduce clutter:
#print(type(key))
Layout
The code uses inconsistent horizontal and vertical whitespace. The black program can be used to automatically reformat the code with consistency.
Parentheses around simple if
conditions:
if(self.getURLFromKey(key) is None):
are usually omitted:
if self.getURLFromKey(key) is None:
Simpler
In the getURLFromKey
function, these statements:
if(len(rows)>0):
return rows[0][0]
else:
return None
are more commonly simplified as:
if len(rows):
return rows[0][0]
return None
There is no need to compare len
against 0, and there is
no need for the else
.
Naming
PEP 8 recommends snake_case for function and variable names. For example:
getKey
would be get_key
hashedKey
would be hashed_key
DbC
Design by Contract asks that you write correct code by creating small, testable functions which compose in a certain way. They either produce an output value that conforms to the contract, or they raise an exception so it is as though the computation never even started, and no one will try to treat some sentinel error value as though it were a correct answer.
For example, if we disregard complex numbers, and wish to compute \2ドル \sqrt{n}\$, we might do this:
import math
def double(n):
return 2 * n
def root(n):
assert n >= 0
return math.sqrt(n)
n = -9
print(double(root(n)) # blows up with AssertionError
error "handlers" that do not handle
except:
print("Unable to initialize database")
Better to just let the __init__
constructor raise
,
producing an informative diagnostic stack trace.
As written, this seems to resemble code authored
in a language like C which lacks exceptions,
forcing app code to make tedious errno
checks
and print a diagnostic when a check fails.
Languages that offer exceptions spare you from that,
and make it more likely that composition of functions
will compute a correct result.
"Correct" includes throwing a fatal exception
where appropriate.
BTW, avoid "bare except", as it interferes with important functionality like CTRL/C. Prefer the usage that you later employed:
except Exception as e:
print("Unable to create table")
print(e)
Though that one doesn't "handle" the
lack of a table, so createTable()
shouldn't be using a try
/ except
either.
design of Public API
In insertURL()
and getURLFromKey()
the
error handlers actually do something.
They return None
in the error case.
But now any calling code is burdened
with remembering the return type is str | None
and checking for the sentinel.
That happens properly at this call site:
if(self.getURLFromKey(key) is None):
An alternative approach would be to unconditionally
INSERT, and handle a duplicate entry by having
an except
clause return key
.
BTW, this isn't C or Java, so prefer no parens:
if self.getURLFromKey(key) is None:
relational integrity
CREATE TABLE ... urlTable (id, url)
All of this brings us to the topic of PK or unique index. Prefer this:
CREATE TABLE ... urlTable (id PRIMARY KEY, url)
Now instead of remembering to do an app level check on each INSERT, we can let the RDBMS backend enforce the integrity constraint.
It also wouldn't be a bad idea to reveal
Author's Intent by putting UNIQUE INDEX on
the url
column.
Though in this codebase it should make little practical
difference, given that we rely on MD5 being
collision-free.
The hash function induces a 1:1 relationship.
Explore related questions
See similar questions with these tags.