I need to import data from an old database to a new one, with slightly different structure. For example, in the old database, there is a table recording employees and their supervisors:
CREATE TABLE employee (ident TEXT PRIMARY KEY, name TEXT, supervisor_name TEXT)
Now, the new database is as follows:
CREATE TABLE person (id BIGSERIAL PRIMARY KEY, name TEXT, old_ident TEXT);
CREATE TABLE team (id BIGSERIAL PRIMARY KEY);
CREATE TABLE teammember (person_id BIGINT, team_id BIGINT, role CHAR(1));
That is, instead of plain table of employees with their supervisors' names, the new (more generic) database enables to create teams of people. The employees are members with role 'e'
, supervisors with role 's'
.
The question is how to easily migrate the data from employee
to the new structure, one team per employee-supervisor pair. For example, employees
employee: ('abc01', 'John', 'Dave'), ('abc02', 'Kyle', 'Emily')
are to be migrated as
person: (1, 'John', 'abc01'), (2, 'Dave', NULL), (3, 'Kyle', 'abc02'), (4, 'Emily', NULL)
team: (1), (2)
teammember: (1, 1, 'e'), (2, 1, 's'), (3, 2, 'e'), (4, 2, 's')
I would consider using a data-modifying CTE, inserting the employees and supervisors first, then teams among them. However, CTE may only return data from the inserted table row. Thus, I am not able to match who was the supervisor of who.
The only solution I can see is using plpgsql
, which would simply iterate over the data, hold the inserted team IDs in a temporary variable, and then insert the appropriate teammember
rows. But I am curious whether there are simpler or more elegant solutions.
There will be roughly several hundreds to several thousands of employees. Although it is generally a good practice, in my case, I wouldn't like to generate the new IDs based on the old ones, as the old IDs are strings like *.GM2
. I store them to the old_ident
column for reference.
2 Answers 2
You have all the information you need to populate the new database from the old one with 4 insert statements:
create table team_ids (id serial, name TEXT)
insert into team_ids (name)
select distinct supervisor_name from employee
-- now supervisors have ids assigned by "serial" type
insert into person (id, name, old_ident)
select ident, name, ident from employee
union
select ident, supervisor_name, ident from employee
insert into team (id) -- meh
select id from team_ids
insert into teammember (person_id, team_id, role)
select e.ident, t.id, 'e')
from employee as e, join team_ids as t
on t.name = e.supervisor_name
union -- and, I guess
select t.id, t.id, 'm')
from team_ids as t
You may have to adjust to taste. I'm assuming employee.ident can be mapped onto person.id, and that your DBMS allows assigning values to columns with auto-generated values. Except for that, it's just basic SQL, nothing fancy and, of course, no loops.
Additional commentary:
- The 'team' table might be (more conventionally) renamed to department.
- A
SERIAL
(with its 2 billion possibilities) should be plenty, no need for aBIGSERIAL
. - There appears to be no database mechanism to enforce 1:1 cardinality of manager to team. Doesn't every team need a leader, by definition? Is there not a
CHECK
orFOREIGN KEY
constraint for teammember.role? Perhaps the question simplified these details away. - The "teammember" table name would more conventionally have a word boundary, for example TeamMember or team_member.
-
1This way you will have duplicate IDs in the
person
table.András Váczi– András Váczi2016年02月19日 09:01:20 +00:00Commented Feb 19, 2016 at 9:01
PL/PgSQL will do the job.
DO $$
DECLARE
_e record;
_personid bigint;
_suppersonid bigint;
_teamid bigint;
BEGIN
FOR _e IN
SELECT ident, name, supervisor_name FROM employee
LOOP
-- insert person record for employee
INSERT INTO person (name, old_ident)
SELECT _e.name, _e.ident
RETURNING id INTO _personid;
-- lookup or insert person record for supervisor
SELECT id INTO _suppersonid FROM person
WHERE p.name = _e.supervisor_name;
IF _suppersonid IS NULL THEN
INSERT INTO person (name) SELECT _e.supervisor_name
RETURNING id INTO _suppersonid;
END IF;
-- lookup team by supervisor or insert new team
SELECT team_id INTO _teamid FROM teammember tm
WHERE tm.person_id = _suppersonid AND tm.role = 's';
IF _teamid IS NULL THEN
-- new supervisor: insert new team and supervisor
INSERT INTO team (id) VALUES(DEFAULT) RETURNING id INTO _teamid;
INSERT INTO teammember (person_id, team_id, role) SELECT _suppersonid, _teamid, 's';
END IF;
-- insert team member (non-supervisor) record
INSERT INTO teammember (person_id, team_id, role) SELECT _personid, _teamid, 'e';
END LOOP;
END; $$;
team
which would hold the ID of person for which the team was created would solve the problem. I am still curious whether there is a more elegant (i.e., using no DDL) solution, though.