I'm trying to write a query to insert filenames to the following table table path
CREATE SEQUENCE path_id_seq;
CREATE TABLE path (
id INTEGER PRIMARY KEY DEFAULT NEXTVAL('path_id_seq'),
parent_id INTEGER,
name TEXT NOT NULL);
CREATE UNIQUE INDEX path_parent_id_name_ix
ON path (COALESCE(parent_id, 0), name);
ALTER TABLE path
ADD CONSTRAINT ios_path_parent_id_fk
FOREIGN KEY (parent_id) REFERENCES path (id)
ON DELETE CASCADE
DEFERRABLE INITIALLY DEFERRED;
For example, given filenames Assets/images/a.png
and Assets/images/b.png
I'd like to insert the following rows.
id | parent_id | name
----+-----------+-------------------
1 | ¤ | Assets
2 | 1 | images
3 | 2 | a.png
4 | 2 | b.png
Inserting the parent is easy;
INSERT INTO path (name)
VALUES ('Assets')
ON CONFLICT (COALESCE(parent_id, 0), name)
DO UPDATE SET name = EXCLUDED.name -- this strange update is so that we get rows back
RETURNING id, parent_id, name;
which yields;
id | parent_id | name
----+-----------+--------
1 | ¤ | Assets
But I can't seem to put together a recursive CTE to use the parent's id as the parent_id in the next INSERT statement.
It will probably look somethings like this invalid query below.
WITH RECURSIVE names AS (
SELECT name, lag(name, 1) OVER () as previous_name
FROM (SELECT unnest(string_to_array('Assets/images/232.png', '/')) as name) names),
paths(id, name) AS (
INSERT INTO path (name)
SELECT name
FROM names
WHERE previous_name IS NULL
ON CONFLICT (COALESCE(parent_id, 0), name)
DO UPDATE SET name = EXCLUDED.name
RETURNING id, name
UNION
INSERT INTO path (parent_id, name)
SELECT paths.id, names.name
FROM paths
JOIN names ON names.previous_name = paths.name
ON CONFLICT (COALESCE(parent_id, 0), name)
DO UPDATE SET name = EXCLUDED.name
RETURNING id, name)
SELECT *
FROM paths;
But I just can't seem to get it to work. If anyone sees a simpler way to do this it would be much appreciated.
2 Answers 2
You don't need a recursive CTE, you can use nextval()
in the result from the unnest()
directly and then use lag()
to get the value from the previous row. To deal with existing rows, I would do this using a join between the result of the unnest() and the path table, detecting new and existing rows.
For the new rows, a new ID will be generated and only those will be inserted:
insert into path (id, name, parent_id)
select id, name, parent_id
from (
select *,
case
when is_new
then lag(id) over (order by level)
else existing_parent
end as parent_id
from (
SELECT case
when p.id is null then nextval('path_id_seq')
else p.id
end as id,
x.name, x.level, p.parent_id as existing_parent,
p.id is null as is_new
FROM unnest(string_to_array('Assets/images/232.png', '/')) with ordinality as x(name, level)
LEFT JOIN path p on p.name = x.name
order by level
) t1
) t2
where is_new;
Live example: http://rextester.com/YGMT19565
The only "drawback" if you will is that the sequence values are not necessarily assigned in the "order" you'd expect. But as the only purpose of a generated primary key is to be unique (and nothing else) this shouldn't be a problem.
The lazy but effective way to solve this is with
CREATE OR REPLACE FUNCTION my_func(filename TEXT) RETURNS SETOF INTEGER AS
$BODY$
DECLARE
pid INTEGER := NULL;
word TEXT;
BEGIN
FOR word IN SELECT unnest(string_to_array(filename, '/')) LOOP
WITH parent AS (
INSERT INTO path (parent_id, name)
VALUES (pid, word)
ON CONFLICT (COALESCE(parent_id, 0), name)
DO UPDATE SET name = EXCLUDED.name
RETURNING id)
SELECT parent.id INTO pid FROM parent;
RETURN NEXT pid;
END LOOP;
RETURN;
END;
$BODY$ LANGUAGE plpgsql;
SELECT my_func('the/boots/are/on/the/ground');