9

Using PostgreSQL 9.5. It seems like Postgres is being inconsistent with how it compares strings similar to لى. A unique constraint is considering some strings equal that GROUP BY is considering different.

I have a select query that is using GROUP BY on a TEXT column and an aggregate function on some other column to ensure uniqueness of the TEXT column in the output, and I'm inserting the results into a table with a primary key (and thus unique) constraint on the TEXT column.

Looks similar to this; I've just changed the table names for simplicity:

 INSERT INTO mytable ( % mytable has string TEXT PRIMARY KEY, score INT
 SELECT 
 sq1.string string, sq2.score / sq1.score
 FROM
 (
 SELECT n.string string, SUM(n.score) score
 FROM
 othertable1 n % has string TEXT (non-unique) and score INT
 GROUP BY string
 ) sq1,
 (
 SELECT n.string string, SUM(n.score) score
 FROM
 othertable2 n % has string TEXT (non-unique) and score INT
 GROUP BY string
 ) sq2
 WHERE sq1.string = sq2.string
 ORDER BY score DESC LIMIT 100000
 );

This should never fail, right? It fails: violation of unique constraint due on key لى.

And it has worked many times before with other data sets containing millions of rows in PG 9.3; I don't know whether لى was in the data back then. I know Arabic has decorations you can put on the letters, so I wonder if that's tripping it up.

Does anyone have an alternative explanation, or should I report this as a bug once I can reproduce it more easily?

UPDATE: Confirmed that the query runs successfully on a PostgreSQL 9.3 server with the same data. There are some moving parts here, so I'm trying to find exactly what the problematic strings are so I can make a simple list of queries anyone can run to expose a bug.

UPDATE 2: Argh, I can't get my database to give me a set of strings I can copy into a table and expose a bug. I've been trying to do it with COPY TO. Something along the way keeps stripping the Arabic text of the differences that are making it fail, I think. But I tried a simpler query, and it's also failing. It's more obvious that this should work:

INSERT INTO mytable ( % mytable has string TEXT PRIMARY KEY, score INT
 SELECT n.string string, SUM(n.score) score
 FROM
 othertable2 n % has string TEXT (non-unique) and score INT
 GROUP BY string
);

I'm still working towards getting something others can try because, of course, a bug report is useless if I say that it only works on my data.

UPDATE 3: I ran it again with different data and ran into the same problem with Cyrillic characters У and В. Making a table containing them didn't reveal anything. Same problem as with the Arabic text, I think. Something is getting stripped along the way.

UPDATE 4: This is definitely a bug. I'm still trying to figure out how to report this. I found a workaround:

WITH glitch(string, score) AS (
 SELECT n.string string, SUM(n.score) score
 FROM
 othertable2 n % has string TEXT (non-unique) and score INT
 GROUP BY string
)
INSERT INTO mytable ( % mytable has string TEXT PRIMARY KEY, score INT
 SELECT DISTINCT ON(string) * FROM glitch
);
asked Feb 21, 2016 at 6:49
3
  • All my TEXT columns are using default COLLATE settings, and my locale is en_US.UTF-8. Commented Feb 21, 2016 at 18:46
  • Is mytable empty before running this INSERT? Commented Feb 23, 2016 at 8:04
  • Yes. I made sure it was empty. Furthermore, if I run the SELECT without INSERTing the results in psql, it opens a vim window of the results, I search for لى, and I get multiple results that look the same but with different scores. Commented Feb 23, 2016 at 19:37

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.