SQL queries in a dataframe

Question 1

I want to get table names and column names from queries in a dataframe. The dataframe is like this:

Date Query
29-03-2019 SELECT * FROM table WHERE ..
30-03-2019 SELECT * FROM ... JOIN ... ON ...WHERE ..
.... ....
20-05-2019 SELECT ...

and I run function to that dataframe to get tablename from the queries.

import sqlparse
from sqlparse.tokens import Keyword, DML
def getTableKey(parsed):
 findFrom = False
 wordKey = set(
 [
 "FROM",
 "JOIN",
 "LEFT JOIN",
 "INNER JOIN",
 "RIGHT JOIN",
 "OUTER JOIN",
 "FULL JOIN",
 ]
 )
 for word in parsed.tokens:
 if word.is_group:
 yield from getTableKey(word)
 if findFrom:
 if isSelect(word):
 yield from getTableKey(word)
 elif word.ttype is Keyword:
 findFrom = False
 StopIteration
 else:
 yield word
 if word.ttype is Keyword and word.value.upper() in wordKey:
 findFrom = True
def getTableName(sql):
 tableReg = re.compile(r"^.+?(?<=[.])")
 tableName = []
 query = sqlparse.parse(sql)
 for word in query:
 if word.get_type() != "UNKNOWN":
 stream = getTableKey(word)
 table = set(getWord(stream))
 for item in table:
 tabl = tableReg.sub("", item)
 tableName.append(tabl)
 return tableName

Also, I run function to get columnname from queries.

def getKeyword(parsed):
 kataKeyword = set(["WHERE", "ORDER BY", "ON", "GROUP BY", "HAVING", "AND", "OR"])
 from_seen = False
 for item in parsed.tokens:
 if item.is_group:
 yield from getKeyword(item)
 if from_seen:
 if isSelect(item):
 yield from getKeyword(item)
 elif item.ttype is Keyword:
 from_seen = False
 StopIteration
 else:
 yield item
 if item.ttype is Keyword and item.value.upper() in kataKeyword:
 from_seen = True
def getAttribute(sql):
 attReg = re.compile(r"asc|desc", re.IGNORECASE)
 namaAtt = []
 kueri = sqlparse.parse(sql)
 for kata in kueri:
 if kata.get_type() != "UNKNOWN":
 stream = getKeyword(kata)
 table = set(getWord(stream))
 for item in table:
 tabl = attReg.sub("", item)
 namaAtt.append(tabl)
 return namaAtt

But as this is my first try, I need an opinion about what I've tried, because my code runs slowly with a large file.

Question 2

That will not speedup your code, but there are some code improvements:

follow naming conventions getAttribute -> get_attribute https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html
You can create set using set literal my_set = {1, 2, 3}
You can compile tableReg = re.compile(r"^.+?(?<=[.])") once

Question 3

The canonical reference related to code style in Python is the official Style Guide for Python Code widely known as PEP8.

xander27 xander27 1561 bronze badge · Answer 1 · 2019-07-16 07:49:12Z

That will not speedup your code, but there are some code improvements:

follow naming conventions getAttribute -> get_attribute https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html
You can create set using set literal my_set = {1, 2, 3}
You can compile tableReg = re.compile(r"^.+?(?<=[.])") once

The canonical reference related to code style in Python is the official Style Guide for Python Code widely known as PEP8.

Stack Exchange Network

SQL queries in a dataframe

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQL queries in a dataframe

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions