1

I'm working with a Firebird 2.5 database and recently I caught a "wrong practice" which was using binary blobs (sub type 0) to store big texts.

Even though there is not actually a problem with doing that, binary blobs won't store enconding and that can cause issues, specially when you're working with third party libraries, like node-firebird.

Our database is WIN1252 encoded, a legacy practice from more than 10 years ago. Since these are huge databases, converting to UTF8 is something that we just can't takle at the moment.

I found out that Firebird has support for a different type of blob (sub type 1) which are defined as Text and support proper encoding.

I started converting those binary blobs to text blobs with the proper encoding, created another field, updated the data from one to the other, renamed the old and then renamed the new field as the old one, should've been easy right?

Nope. Now, whenever we perform queries on that field using LIKE or CONTAINING, we get the following exception:

Cannot transliterate character between character sets.

I've come to understand that this is because some of the data stored in that field is considered UTF8 and some UTF8 characters can't be translated to WIN1252.

Just to sanity check, I created a new table, added a field using the same domain as the field I described earlier, added some records manually and performed the LIKE/CONTAINING queries, worked like a charm.

What I want to know is how to properly transfer my data from one field to the other encoding it to WIN1252 so that I can perform those type of queries again.

(We managed to temporarily get around that issue casting the new field to the same domain as the old one when performing the LIKE/CONTAINING queries, thankfully it started working again)

I don't know if this will help, but here's the DDL for both domains:

The new BLOB Text field

CREATE DOMAIN D_BLOB AS
BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET WIN1252
COLLATE WIN1252;

The old BLOB Binary field

CREATE DOMAIN D_MEMO AS
BLOB SUB_TYPE 0 SEGMENT SIZE 80;
asked Sep 5 at 13:28
New contributor
Nickolas de Luca Alberton is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

1 Answer 1

0

Never mind guys, the issue was that SOME characters in the field were badly encoded and were taken from a poorly setup connection to the database, in this case, they used RUSSIAN_CHARSET to connect to our WIN1252 database and edited some triggers and procedures that contained strings, those strings then became corrupted and that corruption was inserted into the field, causing the issue.

When the field was a binary blob, no problem. But when we imposed an WIN1252 encoding on it, it caused the issue.

answered 2 days ago
New contributor
Nickolas de Luca Alberton is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.