Return to Question

Post Timeline

added 147 characters in body; added 72 characters in body

edited Sep 21, 2010 at 11:01

6.5k
6
37
46

I am connecting to a MS SQL server through SQL Alchemy, using pyodbc module. Everything appears to be working fine, until I began having problems with the encodings. Some of the non-ascii characters are being replaced with '?'

The DB has a collation 'Latin1_General_CI_AS' (I've checked also the specific fields and they keep the same collation). I started selecting the encoding 'latin1' in the call of create_engine and that appears to work for Western European character (like French or Spanish, characters like é) but not for Easter European characters. Specifically, I have a problem with the character ć

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

UPDATE:

OK, Following these steps, I foundget that the encoding isused by the DB appears to be cp1252. I followed those steps: http://bytes.com/topic/sql-server/answers/142972-characters-encoding Anyway, that appears to be a bad assumption as reflected on answers.

UPDATE2: Anyway, after configuring properly the odbc driver, I don't need to specify the encoding on the Python code.

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

UPDATE:

OK, I found that the encoding is cp1252. I followed those steps: http://bytes.com/topic/sql-server/answers/142972-characters-encoding

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

UPDATE:

OK, Following these steps, I get that the encoding used by the DB appears to be cp1252: http://bytes.com/topic/sql-server/answers/142972-characters-encoding Anyway, that appears to be a bad assumption as reflected on answers.

UPDATE2: Anyway, after configuring properly the odbc driver, I don't need to specify the encoding on the Python code.

added 150 characters in body

Source Link

edited Sep 20, 2010 at 12:58

Khelben

edited Sep 20, 2010 at 12:58

Khelben

6.5k
6
37
46

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

UPDATE:

OK, I found that the encoding is cp1252. I followed those steps: http://bytes.com/topic/sql-server/answers/142972-characters-encoding

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

UPDATE:

OK, I found that the encoding is cp1252. I followed those steps: http://bytes.com/topic/sql-server/answers/142972-characters-encoding

added 520 characters in body; added 196 characters in body

Source Link

edited Sep 20, 2010 at 12:11

Khelben

edited Sep 20, 2010 at 12:11

Khelben

6.5k
6
37
46

I have been trying to select other latin1 encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
# The server is encoded is latin1
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)

I have been trying to select other latin1 encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
# The server is encoded is latin1
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.

I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250 and cp1252, but I keep facing the same problem.

Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?

The code for my current connection is the following

for sqlalchemy import *
def connect():
 return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()

Clarifications and comments:

This problems happens when retrieving information from the DB. I don't need to store anything.
At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can ć be stored? Maybe I'm not correctly understanding collations.
I corrected a little the question, specifically, I have tried more encodings than latin1, also cp1250 and cp1252 (which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)