3
\$\begingroup\$

I want to get table names and column names from queries in a dataframe. The dataframe is like this:

Date Query
29-03-2019 SELECT * FROM table WHERE ..
30-03-2019 SELECT * FROM ... JOIN ... ON ...WHERE ..
.... ....
20-05-2019 SELECT ...

and I run function to that dataframe to get tablename from the queries.

import sqlparse
from sqlparse.tokens import Keyword, DML
def getTableKey(parsed):
 findFrom = False
 wordKey = set(
 [
 "FROM",
 "JOIN",
 "LEFT JOIN",
 "INNER JOIN",
 "RIGHT JOIN",
 "OUTER JOIN",
 "FULL JOIN",
 ]
 )
 for word in parsed.tokens:
 if word.is_group:
 yield from getTableKey(word)
 if findFrom:
 if isSelect(word):
 yield from getTableKey(word)
 elif word.ttype is Keyword:
 findFrom = False
 StopIteration
 else:
 yield word
 if word.ttype is Keyword and word.value.upper() in wordKey:
 findFrom = True
def getTableName(sql):
 tableReg = re.compile(r"^.+?(?<=[.])")
 tableName = []
 query = sqlparse.parse(sql)
 for word in query:
 if word.get_type() != "UNKNOWN":
 stream = getTableKey(word)
 table = set(getWord(stream))
 for item in table:
 tabl = tableReg.sub("", item)
 tableName.append(tabl)
 return tableName

Also, I run function to get columnname from queries.

def getKeyword(parsed):
 kataKeyword = set(["WHERE", "ORDER BY", "ON", "GROUP BY", "HAVING", "AND", "OR"])
 from_seen = False
 for item in parsed.tokens:
 if item.is_group:
 yield from getKeyword(item)
 if from_seen:
 if isSelect(item):
 yield from getKeyword(item)
 elif item.ttype is Keyword:
 from_seen = False
 StopIteration
 else:
 yield item
 if item.ttype is Keyword and item.value.upper() in kataKeyword:
 from_seen = True
def getAttribute(sql):
 attReg = re.compile(r"asc|desc", re.IGNORECASE)
 namaAtt = []
 kueri = sqlparse.parse(sql)
 for kata in kueri:
 if kata.get_type() != "UNKNOWN":
 stream = getKeyword(kata)
 table = set(getWord(stream))
 for item in table:
 tabl = attReg.sub("", item)
 namaAtt.append(tabl)
 return namaAtt

But as this is my first try, I need an opinion about what I've tried, because my code runs slowly with a large file.

asked Jul 16, 2019 at 2:51
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

That will not speedup your code, but there are some code improvements:

  1. follow naming conventions getAttribute -> get_attribute https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html

  2. You can create set using set literal my_set = {1, 2, 3}

  3. You can compile tableReg = re.compile(r"^.+?(?<=[.])") once

answered Jul 16, 2019 at 7:49
\$\endgroup\$
1
  • \$\begingroup\$ The canonical reference related to code style in Python is the official Style Guide for Python Code widely known as PEP8. \$\endgroup\$ Commented Jul 16, 2019 at 7:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.