ALL,
I'm trying to implements a mySQL communication from Python script. Here is what I have:
try:
if append:
self.conn = MySQLdb.connect(.....)
self.cur = self.conn.cursor()
else:
self.conn = MySQLdb.connect(.....)
self.cur = self.conn.cursor()
self.conn.set_character_set('utf8;')
self.cur.execute('SET NAMES utf8;')
self.cur.execute('SET character_set_connection=utf8;')
self.cur.execute('SET GLOBAL innodb_large_prefix=ON')
self.cur.execute('SET GLOBAL innodb_file_format=barracuda')
self.cur.execute('SET GLOBAL innodb_file_per_table=ON')
# Database and table creation
Now my question is: should I run this utf8 and "SET.." queries for every connection or only when the database is created?
Thank you.
1 Answer 1
These different commands do different things. And you're not even doing all of the right ones.
First, if you're using either pyMySQL or a later version of MySQLdb, pass charset='utf8' (notice that's 'utf8' without a semicolon attached!) as an argument to the connect command. That means that your connection defaults to UTF-8, and also enabled use_unicode mode, and you don't need set_character_set. This is a better solution. You will, obviously, need to pass this every time you open a connection, as it's an argument to the connection.
If your library does not accept the charset argument, then you will have to use set_character_set, and you should do so immediately after the connect, every time you connect. But again, don't include that trailing ; there.
Second, if this is Python 2.x, once you enable use_unicode, all SQL strings and all string-valued SQL parameters should be unicode objects, not str objects. You will often get away with not doing that properly (basically, if they're pure ASCII), but you should not depend on that. This is true even for the global, pragma, etc. statements at startup: use u'...' Unicode literals for those too.
Meanwhile, those first two SET should be part of what your database library does when you pass it a charset argument to the connect or call set_character_set. This is why older documentation sometimes says to pass init_command='SET NAMES utf8' if you can't pass charset='utf8'. So, you should never need to do them.
The other three SET queries, of course, have nothing to do with Unicode in the first place. They're all commands that only affect creation of new tables, but I have no idea if you ever create and drop tables in subsequent connections, or only when the database is initially created.
'utf8;'.