Bulk insert M:N relation in PostgreSQL

Question 1

I need to import data from an old database to a new one, with slightly different structure. For example, in the old database, there is a table recording employees and their supervisors:

CREATE TABLE employee (ident TEXT PRIMARY KEY, name TEXT, supervisor_name TEXT)

Now, the new database is as follows:

CREATE TABLE person (id BIGSERIAL PRIMARY KEY, name TEXT, old_ident TEXT);
CREATE TABLE team (id BIGSERIAL PRIMARY KEY);
CREATE TABLE teammember (person_id BIGINT, team_id BIGINT, role CHAR(1));

That is, instead of plain table of employees with their supervisors' names, the new (more generic) database enables to create teams of people. The employees are members with role 'e', supervisors with role 's'.

The question is how to easily migrate the data from employee to the new structure, one team per employee-supervisor pair. For example, employees

employee: ('abc01', 'John', 'Dave'), ('abc02', 'Kyle', 'Emily')

are to be migrated as

person: (1, 'John', 'abc01'), (2, 'Dave', NULL), (3, 'Kyle', 'abc02'), (4, 'Emily', NULL)
team: (1), (2)
teammember: (1, 1, 'e'), (2, 1, 's'), (3, 2, 'e'), (4, 2, 's')

I would consider using a data-modifying CTE, inserting the employees and supervisors first, then teams among them. However, CTE may only return data from the inserted table row. Thus, I am not able to match who was the supervisor of who.

The only solution I can see is using plpgsql, which would simply iterate over the data, hold the inserted team IDs in a temporary variable, and then insert the appropriate teammember rows. But I am curious whether there are simpler or more elegant solutions.

There will be roughly several hundreds to several thousands of employees. Although it is generally a good practice, in my case, I wouldn't like to generate the new IDs based on the old ones, as the old IDs are strings like *.GM2. I store them to the old_ident column for reference.

Question 2

I would suggest adding some temporary identifiers to the new tables. This way you can insert data into them while still having the old connections - then you can fetch the necessary rows from the old table and insert them into the next table and so on. For this, I would use separate SQL statements, no need for complicated CTEs or procedural functions.

Question 3

@dezso Thanks for the suggestion. Adding a temporary identifier to team which would hold the ID of person for which the team was created would solve the problem. I am still curious whether there is a more elegant (i.e., using no DDL) solution, though.

Question 4

@OndřejBouda it might be possible to build the tables as CTE queries, but it might get quite complicated pretty fast. The (temp) table solution gives you the luxury of testing the steps individually, by checking the row counts, for example.

Question 5

You have all the information you need to populate the new database from the old one with 4 insert statements:

create table team_ids (id serial, name TEXT)
insert into team_ids (name)
select distinct supervisor_name from employee
-- now supervisors have ids assigned by "serial" type
insert into person (id, name, old_ident)
select ident, name, ident from employee
union
select ident, supervisor_name, ident from employee
insert into team (id) -- meh
select id from team_ids
insert into teammember (person_id, team_id, role)
select e.ident, t.id, 'e')
from employee as e, join team_ids as t
on t.name = e.supervisor_name
union -- and, I guess
select t.id, t.id, 'm')
from team_ids as t

You may have to adjust to taste. I'm assuming employee.ident can be mapped onto person.id, and that your DBMS allows assigning values to columns with auto-generated values. Except for that, it's just basic SQL, nothing fancy and, of course, no loops.

Additional commentary:

The 'team' table might be (more conventionally) renamed to department.
A SERIAL (with its 2 billion possibilities) should be plenty, no need for a BIGSERIAL.
There appears to be no database mechanism to enforce 1:1 cardinality of manager to team. Doesn't every team need a leader, by definition? Is there not a CHECK or FOREIGN KEY constraint for teammember.role? Perhaps the question simplified these details away.
The "teammember" table name would more conventionally have a word boundary, for example TeamMember or team_member.

Question 6

This way you will have duplicate IDs in the person table.

Question 7

PL/PgSQL will do the job.

DO $$
DECLARE
 _e record;
 _personid bigint;
 _suppersonid bigint;
 _teamid bigint;
BEGIN
 FOR _e IN
 SELECT ident, name, supervisor_name FROM employee
 LOOP
 -- insert person record for employee
 INSERT INTO person (name, old_ident)
 SELECT _e.name, _e.ident
 RETURNING id INTO _personid;
 -- lookup or insert person record for supervisor
 SELECT id INTO _suppersonid FROM person
 WHERE p.name = _e.supervisor_name;
 IF _suppersonid IS NULL THEN
 INSERT INTO person (name) SELECT _e.supervisor_name
 RETURNING id INTO _suppersonid;
 END IF;
 -- lookup team by supervisor or insert new team
 SELECT team_id INTO _teamid FROM teammember tm
 WHERE tm.person_id = _suppersonid AND tm.role = 's';
 IF _teamid IS NULL THEN
 -- new supervisor: insert new team and supervisor
 INSERT INTO team (id) VALUES(DEFAULT) RETURNING id INTO _teamid;
 INSERT INTO teammember (person_id, team_id, role) SELECT _suppersonid, _teamid, 's';
 END IF;
 -- insert team member (non-supervisor) record
 INSERT INTO teammember (person_id, team_id, role) SELECT _personid, _teamid, 'e';
 END LOOP;
END; $$;

James K. Lowden James K. Lowden 6123 silver badges4 bronze badges · Answer 1 · 2016-02-18 22:53:20Z

You have all the information you need to populate the new database from the old one with 4 insert statements:

create table team_ids (id serial, name TEXT)
insert into team_ids (name)
select distinct supervisor_name from employee
-- now supervisors have ids assigned by "serial" type
insert into person (id, name, old_ident)
select ident, name, ident from employee
union
select ident, supervisor_name, ident from employee
insert into team (id) -- meh
select id from team_ids
insert into teammember (person_id, team_id, role)
select e.ident, t.id, 'e')
from employee as e, join team_ids as t
on t.name = e.supervisor_name
union -- and, I guess
select t.id, t.id, 'm')
from team_ids as t

You may have to adjust to taste. I'm assuming employee.ident can be mapped onto person.id, and that your DBMS allows assigning values to columns with auto-generated values. Except for that, it's just basic SQL, nothing fancy and, of course, no loops.

Additional commentary:

The 'team' table might be (more conventionally) renamed to department.
A SERIAL (with its 2 billion possibilities) should be plenty, no need for a BIGSERIAL.
There appears to be no database mechanism to enforce 1:1 cardinality of manager to team. Doesn't every team need a leader, by definition? Is there not a CHECK or FOREIGN KEY constraint for teammember.role? Perhaps the question simplified these details away.
The "teammember" table name would more conventionally have a word boundary, for example TeamMember or team_member.

1

This way you will have duplicate IDs in the person table.

András Váczi
– András Váczi

2016年02月19日 09:01:20 +00:00
Commented Feb 19, 2016 at 9:01

filiprem filiprem 6,7471 gold badge19 silver badges32 bronze badges · Answer 2 · 2015-07-30 15:32:01Z

PL/PgSQL will do the job.

DO $$
DECLARE
 _e record;
 _personid bigint;
 _suppersonid bigint;
 _teamid bigint;
BEGIN
 FOR _e IN
 SELECT ident, name, supervisor_name FROM employee
 LOOP
 -- insert person record for employee
 INSERT INTO person (name, old_ident)
 SELECT _e.name, _e.ident
 RETURNING id INTO _personid;
 -- lookup or insert person record for supervisor
 SELECT id INTO _suppersonid FROM person
 WHERE p.name = _e.supervisor_name;
 IF _suppersonid IS NULL THEN
 INSERT INTO person (name) SELECT _e.supervisor_name
 RETURNING id INTO _suppersonid;
 END IF;
 -- lookup team by supervisor or insert new team
 SELECT team_id INTO _teamid FROM teammember tm
 WHERE tm.person_id = _suppersonid AND tm.role = 's';
 IF _teamid IS NULL THEN
 -- new supervisor: insert new team and supervisor
 INSERT INTO team (id) VALUES(DEFAULT) RETURNING id INTO _teamid;
 INSERT INTO teammember (person_id, team_id, role) SELECT _suppersonid, _teamid, 's';
 END IF;
 -- insert team member (non-supervisor) record
 INSERT INTO teammember (person_id, team_id, role) SELECT _personid, _teamid, 'e';
 END LOOP;
END; $$;

Stack Exchange Network

Bulk insert M:N relation in PostgreSQL

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bulk insert M:N relation in PostgreSQL

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions