I'm importing CSV files into a relatively simple PostgreSQL database. (The CSVs are sometimes created manually from information in a book, sometimes created from spreadsheets with a little massaging via scripting.) I do it in small batches to make sure things are going ok, and I've noticed that when COPY fails, it doesn't roll back the sequence attached to the column in the table. (Note: I am importing via pgAdmin.)
For example, let's say this is my table:
TABLE
------------------
id | data1 | data2
------------------
1 | abc | def
2 | ghi | klm
Then let's say I try to import two more columns and it fails. I fix the error, and then the import succeeds. I'd expect the table to look like this:
TABLE
------------------
id | data1 | data2
------------------
1 | abc | def
2 | ghi | klm
3 | nop | qrs
4 | tuv | wxy
Instead, it looks like this:
TABLE
------------------
id | data1 | data2
------------------
1 | abc | def
2 | ghi | klm
5 | nop | qrs
6 | tuv | wxy
The tables all rely on each other (i.e. pretty much every table has a FK pointing to the ID of some other table), so if the IDs stay predictable, my data entry job gets a lot easier. If not, I have to keep double checking the IDs when I finish a section.
Is there any way to prevent this behavior?
2 Answers 2
The sequences are designed like that, the are not supposed to produce contiguous numbers.
However, you can reset them before (re)trying the COPY
with
SELECT setval('sequence_name', (SELECT max(id) FROM your_table));
-
Yeah, I was thinking of writing a Python script that did just this when it caught an error.Zelbinian– Zelbinian2014年06月05日 17:27:34 +00:00Commented Jun 5, 2014 at 17:27
Don't use a sequence if you want contiguous numbers that roll back with an aborted transaction.
Write a pl/pgsql procedure implementing a nextval_gapless
that increments a counter in a table. This counter will get rolled back if the transaction aborts.
Of course, you also get the downsides of this approach: no concurrency, and under some circumstances possible deadlock transaction aborts if you attempt concurrency.
Simple example of an id generator:
CREATE TABLE mytable_gapless_seq(nextid integer);
INSERT INTO mytable_gapless_seq(nextid) VALUES (0);
CREATE OR REPLACE FUNCTION nextval_gapless(idtable regclass) RETURNS integer
LANGUAGE plpgsql
VOLATILE
AS $$
DECLARE
newid integer;
BEGIN
EXECUTE format('UPDATE %I SET nextid = nextid + 1 RETURNING nextid', idtable)
INTO STRICT newid;
RETURN newid;
END;
$$;
then use DEFAULT nextval_gapless('mytable_gapless_seq')
instead of DEFAULT nextval('my_id_seq')
.
If you don't need a generic plpgsql function that can support multiple different sequence tables, like nextval
, you could just write:
CREATE OR REPLACE FUNCTION nextval_gapless_mytable() RETURNS integer
LANGUAGE sql VOLATILE
AS $$
UPDATE mytable_gapless_seq SET nextid = nextid + 1 RETURNING nextid;
$$;
-
For clarification, this is really only a thing I require during the initial data-entry so I don't go insane having to look up ID numbers all the time. After the data's in, it'll be practically read-only. (Also, no idea how to do that pl/pgsql thing you're talking about.)Zelbinian– Zelbinian2014年06月05日 17:31:22 +00:00Commented Jun 5, 2014 at 17:31
-
@Zelbinian See examples added.Craig Ringer– Craig Ringer2014年06月06日 06:31:46 +00:00Commented Jun 6, 2014 at 6:31
-
You don't need dynamic SQL to support multiple sequences with a generic function. You just need one table with multiple rows.user1822– user18222015年11月19日 19:38:06 +00:00Commented Nov 19, 2015 at 19:38
-
@a_horse_with_no_name er. Good point. blush.Craig Ringer– Craig Ringer2015年11月20日 00:59:46 +00:00Commented Nov 20, 2015 at 0:59
COPY
withSELECT setval('sequence_name', (SELECT max(id) FROM table));