I'm working with a Firebird 2.5 database and recently I caught a "wrong practice" which was using binary blobs
(sub type 0
) to store big texts.
Even though there is not actually a problem with doing that, binary blobs won't store enconding and that can cause issues, specially when you're working with third party libraries, like node-firebird
.
Our database is WIN1252
encoded, a legacy practice from more than 10 years ago. Since these are huge databases, converting to UTF8
is something that we just can't takle at the moment.
I found out that Firebird has support for a different type of blob
(sub type 1
) which are defined as Text
and support proper encoding.
I started converting those binary blobs to text blobs with the proper encoding, created another field, updated the data from one to the other, renamed the old and then renamed the new field as the old one, should've been easy right?
Nope. Now, whenever we perform queries on that field using LIKE
or CONTAINING
, we get the following exception:
Cannot transliterate character between character sets.
I've come to understand that this is because some of the data stored in that field is considered UTF8
and some UTF8 characters can't be translated to WIN1252.
Just to sanity check, I created a new table, added a field using the same domain as the field I described earlier, added some records manually and performed the LIKE/CONTAINING queries, worked like a charm.
What I want to know is how to properly transfer my data from one field to the other encoding it to WIN1252 so that I can perform those type of queries again.
(We managed to temporarily get around that issue casting the new field to the same domain as the old one when performing the LIKE/CONTAINING queries, thankfully it started working again)
I don't know if this will help, but here's the DDL for both domains:
The new BLOB Text field
CREATE DOMAIN D_BLOB AS
BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET WIN1252
COLLATE WIN1252;
The old BLOB Binary field
CREATE DOMAIN D_MEMO AS
BLOB SUB_TYPE 0 SEGMENT SIZE 80;
1 Answer 1
Never mind guys, the issue was that SOME characters in the field were badly encoded and were taken from a poorly setup connection to the database, in this case, they used RUSSIAN_CHARSET to connect to our WIN1252 database and edited some triggers and procedures that contained strings, those strings then became corrupted and that corruption was inserted into the field, causing the issue.
When the field was a binary blob, no problem. But when we imposed an WIN1252 encoding on it, it caused the issue.