PostgreSQL 9.1 streaming replication problem: replica fails to use an index properly

Question 1

We use PostgreSQL 9.1.7 on Ubuntu Linux 12.04 on a master server and PostgreSQL 9.1.7 on FreeBSD 9.0-RELEASE on a replica server. The replica and master servers return different results on the same SQL query. A query plan shows that an index (a BTree one, we do not use hash indexes at all) is used to get the result so it looks like an index is in inconsistent or incomplete state on the replica server. The query on the master server:

db1=# select id from users where email='[email protected]';
 id 
---------
 1698116
(1 row)
db1=#

The query on the replica server:

db1=> select id from users where email='[email protected]';
 id 
----
(0 rows)
db1=> select created_at from users where id=1698116;
 created_at 
----------------------------
 2013年03月04日 10:40:05.221214
(1 row)
db1=>

As you can see the replica DB already contains a user with proper ID so the data is in place but just not indexed yet for some reason. We double checked the replica was in receiving/reapplying state so this was not a temporary outage. The user never got indexed. We also used to experience similar problems with PostgreSQL 9.0 on CentOS 5.6 so we don't think this is something FreeBSD- or PostgreSQL 9.1-specific.

We use the replica server to run lots of heavy SQL queries, can this be a root of the problem? Anyway how can we efficiently detect and prevent situations like this in the future? The replica was not down today and there was no single error line in logs so we detected this inconsistency only by occasion.

Question 2

What locale is used and can it be trusted to compare strings exactly the same between linux and freebsd? If it can't, that may cause the sort of problem you're describing.

Question 3

Daniel, both master and replica bases have the same encoding/collate/ctype settings. The FreeBSD (replica) host has LANG set to C and the master host has LANG set to en_GB.UTF-8. Most other rows in the users table can be found by their email fields on the replica DB without a problem (well we did not check every other row just a couple of them).

Question 4

what matters is lc_collate of the databases. but I've expanded my idea in an answer where I suggest a test.

Question 5

Assuming the locale of the database is en_GB.UTF-8 both in Ubuntu (master) and FreeBSD (slave), I believe the differences in sort semantics alone may account for the fact that the index is unusable on the slave.

Here's an example of how they sort differently:

On Ubuntu 12.04:

$ export LANG=en_GB.UTF-8
$ cat >file
"0102"
0102 
$ sort file
0102
"0102"

On FreeBSD 9.0-RELEASE:

$ export LANG=en_GB.UTF-8
$ cat >file
"0102"
0102
$ sort file
"0102"
0102

This shows that the same locale orders differently the two strings "0102" and 0102 (even though they do not even contain any character outside the US-ASCII set...)

Here's a test that I suggest you try on your own dataset:

On the master:

$ psql -d dbname -Atc 'select email from users' | LC_COLLATE=en_GB.UTF-8 >email.master

On the slave:

$ psql -d dbname -Atc 'select email from users' | LC_COLLATE=en_GB.UTF-8 >email.slave

Now compare email.master and email.slave with diff or cmp. I suspect you'll find that they are not identical. In which case it demonstrates that the index replica can't be used, since its build rules on the master differ from the scanning rules on the slave.

Question 6

Yes, if you replicate to an OS with a different locale implementation, things will break badly.

score 6 · Accepted Answer · 2013-03-04 16:43:33Z

Assuming the locale of the database is en_GB.UTF-8 both in Ubuntu (master) and FreeBSD (slave), I believe the differences in sort semantics alone may account for the fact that the index is unusable on the slave.

Here's an example of how they sort differently:

On Ubuntu 12.04:

$ export LANG=en_GB.UTF-8
$ cat >file
"0102"
0102 
$ sort file
0102
"0102"

On FreeBSD 9.0-RELEASE:

$ export LANG=en_GB.UTF-8
$ cat >file
"0102"
0102
$ sort file
"0102"
0102

This shows that the same locale orders differently the two strings "0102" and 0102 (even though they do not even contain any character outside the US-ASCII set...)

Here's a test that I suggest you try on your own dataset:

On the master:

$ psql -d dbname -Atc 'select email from users' | LC_COLLATE=en_GB.UTF-8 >email.master

On the slave:

$ psql -d dbname -Atc 'select email from users' | LC_COLLATE=en_GB.UTF-8 >email.slave

Now compare email.master and email.slave with diff or cmp. I suspect you'll find that they are not identical. In which case it demonstrates that the index replica can't be used, since its build rules on the master differ from the scanning rules on the slave.

Yes, if you replicate to an OS with a different locale implementation, things will break badly.

Stack Exchange Network

PostgreSQL 9.1 streaming replication problem: replica fails to use an index properly

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

PostgreSQL 9.1 streaming replication problem: replica fails to use an index properly

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions