I've written a script which creates a new database
, a table
within the database, insert some values
into it and fetch the values
in one go. The way I've created the script below to achieve what I just said seems not to be an ideal way as there are too many repetitions in there. To be specific this two functions create_database()
and connect()
are almost identical. Moreover, I had to use mycursor = conn.cursor()
twice within main function.
import mysql.connector
def create_database():
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "",
database=""
)
return mydb
def connect(databasename):
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd = "",
database=databasename
)
return mydb
def store_data(item_name,item_link):
mycursor.execute("INSERT INTO webdata (item_name,item_link) VALUES (%s,%s)", (item_name,item_link))
conn.commit()
if __name__ == '__main__':
db_name = "newdatabase"
conn = create_database()
mycursor = conn.cursor()
try:
mycursor.execute(f"CREATE DATABASE {db_name}")
except mysql.connector.errors.DatabaseError:
pass
conn = connect(db_name)
mycursor = conn.cursor()
mycursor.execute("DROP TABLE if exists webdata")
mycursor.execute("CREATE TABLE if not exists webdata (item_name VARCHAR(255), item_link VARCHAR(255))")
store_data("all questions","https://stackoverflow.com/questions/tagged/web-scraping")
mycursor.execute("SELECT * FROM webdata")
for item in mycursor.fetchall():
print(item)
What it prints (expected result):
('all questions', 'https://stackoverflow.com/questions/tagged/web-scraping')
2 Answers 2
From the title alone, this sounds like something that should not be done, particularly since this is not a lightweight database such as SQLite. Creating the database, its tables, columns, constraints etc. is not, and should not, be the job of the application - but rather an external setup script with administrative permissions. The application, then, should not have such permissions and should only be able to insert/select/update/delete.
For some reason I thought autocommit is enabled by default for the MySQL Python connector, but apparently it's not:
https://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html
So your explicit call to commit
is fine.
I can't tell from the documentation whether the cursor and connection objects support context management. Try putting both of those in with
statements, and if it complains that there's no __enter__
, switch to a try/finally
that guarantees cursor and connection closure.
-
1\$\begingroup\$ It seems you concluded in 2020 that context management is not supported by default. According to the bugtracker, the functionality has been added since then. However, it might be a good idea to double-check that. \$\endgroup\$2021年02月28日 19:44:54 +00:00Commented Feb 28, 2021 at 19:44
Python functions can have optional arguments with default values. If you adjust
the signature of connect()
to have a default, the create_database()
function will be unnecessary.
def connect(database_name = ''):
return mysql.connector.connect(
host = 'localhost',
user = 'root',
passwd = '',
database = database_name
)
Contrary to your question text, you don't have a main function; you just have
top-level code nested under an if-conditional. Top-level code (other than
constants or imports) is generally a bad idea because it is inflexible and not
easily tested or experimented with. Move that code into a proper
main()
function, and then just invoke it in the if-conditional. It's also not
a bad idea to prepare for the future of your script by including the
ability to handle simple command-line arguments -- often handy for debugging and
experimentation even if they are never part of an intended use case of the script.
import sys
def main(args):
...
if __name__ == '__main__':
main(sys.argv[1:])
After you make that change, the store_data()
function will be broken, because
it depends on having access to the global mycursor
variable. That's another
illustration of the problems with top-level code: it can camouflage dependencies
and, in some cases, create situations that are difficult to debug and disentangle if
the volume of such code grows large enough. Instead, the store_data()
function
should take the DB connection as an explicit argument.
A rigorous approach of putting all code inside of functions will seem like a small hassle at first; but in my experience, it nearly always pays off in the form of fewer bugs, greater flexibility, improved code readability, and various other benefits.