7
\$\begingroup\$

This is the second round of reviews. The first round can be found in this question.

This is a project I have been working on. This is one of my first experiences with Python and OOP as a whole. I have written a GUI that handles the inputs for these classes, but I will ask for a separate review for that, since the question would be rather bulky when including both.

The goal of this program is to create standard SQL (SQL server) queries for everyday use. The rationale behind this is that we regularly need similar queries, and would like to prevent common mistakes in them. The focus on this question is on the Python code however.

The information about the tables and their relation to each-other is provided by a JSON file, of which I have attached a mock-up version.

The code consists of three parts:

  • A universe class which handles the JSON file and creates the context of the tables.

  • A query class, which handles the specifications of which tables to include, which columns to take, how to join each table and optional where statements.

  • A PyQT GUI that handles the inputs. This is excluded in this post and will be posted separately for another review. It can be found here on Github

The JSON:

{
 "graph": {
 "table1": {
 "tag": ["table1"],
 "DBHandle": ["tables.table1"],
 "Priority": [1],
 "Columns": ["a", "b", "c"],
 "Joins": {
 "table2": ["on table2.a = table1.a", "inner"],
 "table3": ["on table1.c = table3.c", "inner"]
 }
 },
 "table2": {
 "tag": ["table2"],
 "DBHandle": ["tables.table2"],
 "Priority": [2],
 "Columns": ["a", "d", "e"],
 "Joins": {
 "table3": ["on table2.d=table3.d and table2.e = table3.e", "inner"]
 }
 },
 "table3": {
 "tag": ["table3"],
 "DBHandle": ["tables.table3"],
 "Priority": [4],
 "Columns": ["c", "d", "e"],
 "Joins": []
 }
 },
 "presets": {
 "non empty b": {
 "table": ["table1"],
 "where": ["table1.b is not null"]
 }
 }
}

The reviewed Python code:

# -*- coding: utf-8 -*-
"""
Created on Thu Aug 3 14:33:44 2017
@author: jdubbeldam
"""
from json import loads
class Universe:
 """
 The Universe is a context for the Query class. It contains the information
 of the available Database tables and their relation to eachother. This
 information is stored in a JSON file.
 """
 def __init__(self, filename):
 """
 Reads the JSON and separates the information in a presets dictionary and
 a graph dictionary. The latter contains the information of the nodes in
 the universe/graph, including relational information.
 """
 with open(filename, encoding='utf-8') as file:
 self.json = loads(str(file.read()))
 self.presets = self.json['presets']
 self.json = self.json['graph']
 self.tables = self.json.keys()
 self.connections = self.get_edges()
 def get_edges(self):
 """
 Creates a dictionary with for each node a list of nodes that join on
 that node.
 """
 edges = {}
 for table in self.tables:
 edges[table] = []
 try:
 edges[table] += [connected_tables
 for connected_tables in self.json[table]['Joins']]
 except AttributeError:
 pass
 for node in edges:
 for connected_node in edges[node]:
 if node not in edges[connected_node]:
 edges[connected_node].append(node)
 return edges
 def shortest_path(self, start, end, path_argument=None):
 """
 Calculates the shortest path in a graph, using the dictionary created
 in getEgdes. Adapted from https://www.python.org/doc/essays/graphs/.
 """
 if path_argument is None:
 old_path = []
 else:
 old_path = path_argument
 path = old_path + [start]
 if start == end:
 return path
 if start not in self.connections:
 return None
 shortest = None
 for node in self.connections[start]:
 if node not in path:
 newpath = self.shortest_path(node, end, path)
 if newpath:
 if not shortest or len(newpath) < len(shortest):
 shortest = newpath
 return shortest
 def join_paths(self, nodes):
 """
 Extension of shortest_path to work with multiple nodes to be connected.
 The nodes are sorted based on the priority, which is taken from the JSON.
 shortest_path is called on the first two nodes, then iteratively on each
 additional node and one of the existing nodes returned by shortest_path,
 selecting the one that takes the fewest steps.
 """
 sorted_nodes = sorted([[self.json[node]['Priority'][0], node] for node in nodes])
 paths = []
 paths.append(self.shortest_path(sorted_nodes[0][1], sorted_nodes[1][1]))
 for next_node_index in range(len(sorted_nodes) - 2):
 shortest = None
 flat_paths = [item for sublist in paths for item in sublist]
 old_path = len(flat_paths)
 for connected_path in flat_paths:
 newpath = self.shortest_path(connected_path,
 sorted_nodes[next_node_index+2][1],
 flat_paths)
 if newpath:
 if not shortest or len(newpath[old_path:]) < len(shortest):
 shortest = newpath[old_path:]
 paths.append(shortest)
 return paths
class Query:
 """
 Query contains the functions that allow us to build an SQL query based on
 a universe object. It maintains lists with the names of activated tables
 and, if applicable, which of their columns in a dictionary. Implicit tables
 are tables that are called, only to bridge joins from one table to another.
 Since they are not explicitly called, we don't want their columns in the query.
 how_to_join is a dictionary that allows setting joins (left, right, inner, full)
 other than the defaults imported from the JSON.
 """
 core = 'select\n\n{columns}\n\nfrom {joins}\n\n where {where}'
 def __init__(self, universum):
 self.graph = universum
 self.active_tables = []
 self.active_columns = {}
 self.implicit_tables = []
 self.join_strings = {}
 for i in self.graph.tables:
 self.join_strings[i] = self.graph.json[i]['Joins']
 self.how_to_join = {}
 self.where = []
 def add_tables(self, tablename):
 """
 Sets given tablename to active. GUI ensures that only valid names
 will be given.
 """
 if tablename not in self.active_tables:
 self.active_tables.append(tablename)
 self.active_columns[tablename] = []
 def add_columns(self, table, column):
 """
 Sets given columnname from table to active. GUI ensures that only valid names
 will be given.
 """
 if column not in self.active_columns[table]:
 self.active_columns[table].append(column)
 def add_where(self, string):
 """
 Adds any string to a list to be input as where statement. This could be
 vulnerable for SQL injection, but the scope of this project is in-house
 usage, and the generated SQL query isn't directly passed to the server.
 """
 self.where.append(string)
 def find_joins(self):
 """
 Calls the join_paths function from Universe class. Figures out which joins
 are needed and which tables need to be implicitly added. Returns a list
 of tuples with tablenames to be joined.
 """
 tags = [self.graph.json[table]['tag'][0]
 for table in self.active_tables]
 join_paths = self.graph.join_paths(tags)
 join_sets = [(table1, table2)
 for join_edge in join_paths
 for table1, table2 in zip(join_edge[:-1], join_edge[1:])]
 for sublist in join_paths:
 for item in sublist:
 if item not in self.active_tables:
 self.add_tables(item)
 self.implicit_tables.append(item)
 return join_sets
 def generate_join_statement(self, table_tuple):
 """
 Creates the join statement for a given tuple of tablenames. The second
 entry in the tuple is always the table that is joined. Since the string
 is stored in a dictionary with one specific combination of the two table
 names, the try statement checks which way around it needs to be. how contains
 the default way to join. Unless otherwise specified, this is used to generate
 the join string.
 """
 added_table = table_tuple[1]
 try:
 on_string, how = self.graph.json[table_tuple[0]]['Joins'][table_tuple[1]]
 except TypeError:
 table_tuple = (table_tuple[1], table_tuple[0])
 on_string, how = self.graph.json[table_tuple[0]]['Joins'][table_tuple[1]]
 if table_tuple not in self.how_to_join:
 self.how_to_join[table_tuple] = how
 join_string = (self.how_to_join[table_tuple]
 + ' join '
 + self.graph.json[added_table]['DBHandle'][0]
 + ' '
 + self.graph.json[added_table]['tag'][0]
 + '\n')
 return join_string + on_string
 def generate_select_statement(self, table):
 """
 Creates the column specification. If no columns of an active table are
 specified, it assumes all the columns are wanted.
 """
 if not self.active_columns[table]:
 self.active_columns[table] = ['*']
 return ',\n'.join([(self.graph.json[table]['tag'][0]
 + '.'
 + i)
 for i in self.active_columns[table]])
 def compile_query(self):
 """
 Handles compilation of the query. If there are more than one activated
 table, joins need to be handled. First the required joins are found, then
 the strings that handle this are generated. The column statement is created.
 If there is no where statement specified, '1=1' is added. The relevent
 statements are added into the core query and returned.
 """
 if len(self.active_tables) == 1:
 base_table = self.active_tables[0]
 join_statement = []
 else:
 joins = self.find_joins()
 base_table = joins[0][0]
 join_statement = [self.generate_join_statement(i) for i in joins]
 join_statement = ([self.graph.json[base_table]['DBHandle'][0]
 + ' '
 + self.graph.json[base_table]['tag'][0]]
 + join_statement)
 completed_join_statement = '\n\n'.join(join_statement)
 column_statement = [self.generate_select_statement(table)
 for table in self.active_tables
 if table not in self.implicit_tables]
 completed_column_statement = ',\n'.join(column_statement)
 if self.where:
 where_statement = '\nand '.join(self.where)
 else:
 where_statement = '1 = 1'
 query = Query.core.replace('{columns}', completed_column_statement)
 query = query.replace('{joins}', completed_join_statement)
 query = query.replace('{where}', where_statement)
 return query
if __name__ == "__main__":
 graph = Universe('example.JSON')
 query = Query(graph)
 query.addTables('table1')
 query.addTables('table2')
 query.addTables('table3')
 print(query.compileQuery())
asked Aug 14, 2017 at 13:05
\$\endgroup\$

3 Answers 3

5
\$\begingroup\$

I have been refactoring this code myself as well in the meanwhile, so I thought I'd post some of the insights I have gained myself.

Class inheritance

Instead of passing a Universe instance when creating a Query, by making Query a subclass of Universe, I was able to reduce the amount of information that was stored in both classes. This makes accessing the attributes and methods of Universe in Query's methods shorter as well.

Query.join_strings does nothing

 self.join_strings = {}
 for i in self.graph.tables:
 self.join_strings[i] = self.graph.json[i]['Joins']

self.join_strings is defined, but used nowhere else. Also the use of i is bad (was an oversight).

Indirectly still iterating over .keys()

 self.json = self.json['graph']
 self.tables = self.json.keys()

in Universe.__init__() stores the keys (tablenames). This is only used to iterate later:

 edges = {}
 for table in self.tables:
 edges[table] = []
 try:
 edges[table] += [connected_tables
 for connected_tables in self.json[table]['Joins']]
 except AttributeError:
 pass

We might as well have iterated over self.json. However, for naming purposes, I prefer the following:

 self.tables = self.json['graph']

Since that improves the naming, and removes the need to keep the json attribute around. So we can turn that into a regular variable without the self.

Expand the add_* methods to also allow for removing of that item.

This is mostly relevant with the GUI in mind. It contained a bit of a workaround to be able to remove tables and columns from the Query.

So I added an argument to the add_* methods to be able to set to remove instead.

def add_tables(self, tablename, add_or_remove=True):
 """
 Toggles active setting of given tablename. GUI ensures that only valid names
 will be given.
 """
 if add_or_remove:
 if tablename not in self.active_tables:
 self.active_tables.append(tablename)
 self.active_columns[tablename] = []
 else:
 self.active_tables.remove(tablename)
answered Aug 15, 2017 at 8:19
\$\endgroup\$
2
  • 1
    \$\begingroup\$ So, without looking at your code, what does add_or_remove = True mean? Does it add or remove? I would make it remove=False, or vastly preferred if possible, just have a remove method... \$\endgroup\$ Commented Aug 15, 2017 at 20:15
  • \$\begingroup\$ @Graipher good point. Turning it into a separate method is definitely possible. It just means that I move the if statement to the GUI. \$\endgroup\$ Commented Aug 15, 2017 at 20:26
4
\$\begingroup\$

Since I noticed if __name__ == "__main__":, I assume you are executing the python file from commandline. If so, you could also add

#!/usr/bin/env python

at the very top of your file, and make the file executable (chmod a+x) so that you can simply execute with ./filename.py in *nix cli.


When you define Query.core query, you should not hardbind this. It would not be extensible in current scenario; in the sense that if you wish to provide INSERT or DELETE clauses to your generator.

answered Aug 14, 2017 at 20:34
\$\endgroup\$
1
  • 1
    \$\begingroup\$ object is not needed anymore as a base class in Python 3 (which this question is tagged with) as it is implied for any class (pretty much like you don't specify metaclass=type either). \$\endgroup\$ Commented Aug 21, 2017 at 11:39
1
\$\begingroup\$

I found that the method names query.method_name() were misspelled as object.methodName(). That is they were camelCase syntax, so the example above will not run. After changing those to PEP8 format object.add_tables() syntax, the application runs.

def main():
 """
 Creates an example query
 """
 file = 'example.JSON'
 query = Query(file)
 query.add_tables('table1')
 query.add_tables('table2')
 query.add_tables('table3')
 print(query.compile_query())

I need more information on the universe.uni initialization file in order to flesh out this application completely and get it working.

Stephen Rauch
4,31412 gold badges24 silver badges36 bronze badges
answered Jul 29, 2019 at 1:38
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.