0

ALL,

I'm trying to implements a mySQL communication from Python script. Here is what I have:

try:
 if append:
 self.conn = MySQLdb.connect(.....)
 self.cur = self.conn.cursor()
 else:
 self.conn = MySQLdb.connect(.....)
 self.cur = self.conn.cursor()
 self.conn.set_character_set('utf8;')
 self.cur.execute('SET NAMES utf8;')
 self.cur.execute('SET character_set_connection=utf8;')
 self.cur.execute('SET GLOBAL innodb_large_prefix=ON')
 self.cur.execute('SET GLOBAL innodb_file_format=barracuda')
 self.cur.execute('SET GLOBAL innodb_file_per_table=ON')
# Database and table creation

Now my question is: should I run this utf8 and "SET.." queries for every connection or only when the database is created?

Thank you.

asked Jan 28, 2014 at 4:47
2
  • I don't think there's a character set named 'utf8;'. Commented Jan 28, 2014 at 4:52
  • @abarnert, Well it worked, so I didn't look at the syntax. But my take on the question is that I should execute those lines whether I'm creating or adding. Am I right? Commented Jan 28, 2014 at 6:01

1 Answer 1

2

These different commands do different things. And you're not even doing all of the right ones.

First, if you're using either pyMySQL or a later version of MySQLdb, pass charset='utf8' (notice that's 'utf8' without a semicolon attached!) as an argument to the connect command. That means that your connection defaults to UTF-8, and also enabled use_unicode mode, and you don't need set_character_set. This is a better solution. You will, obviously, need to pass this every time you open a connection, as it's an argument to the connection.

If your library does not accept the charset argument, then you will have to use set_character_set, and you should do so immediately after the connect, every time you connect. But again, don't include that trailing ; there.

Second, if this is Python 2.x, once you enable use_unicode, all SQL strings and all string-valued SQL parameters should be unicode objects, not str objects. You will often get away with not doing that properly (basically, if they're pure ASCII), but you should not depend on that. This is true even for the global, pragma, etc. statements at startup: use u'...' Unicode literals for those too.

Meanwhile, those first two SET should be part of what your database library does when you pass it a charset argument to the connect or call set_character_set. This is why older documentation sometimes says to pass init_command='SET NAMES utf8' if you can't pass charset='utf8'. So, you should never need to do them.

The other three SET queries, of course, have nothing to do with Unicode in the first place. They're all commands that only affect creation of new tables, but I have no idea if you ever create and drop tables in subsequent connections, or only when the database is initially created.

answered Jan 28, 2014 at 5:04
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for an explanation. Couple of follow ups: 1. How do I know the MySQLDB version? 2. What is the minimal version required to use with your first sentence? 3. Maybe for compatibility sake just always use "set_character_set"? 4. The last 3 queries - they will just execute even if no new tables will be created, right?
One more thing if u know: What is the default encoding for mySQL connection on Windows (both client and server locally - development version)?
And another one - where should I enable "use_unicode"? I'm using Windows XP right now for development. And hopefully the last one - the first 2 SET commands are extra because they will do what set_character_set will do and so they are not needed, correct?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.