I'm having a problem with deadlocks on a many-to-many insert and am pretty far out of my league at this point.
I have a tweet
table that receives thousands of records per second. One of the columns is a PostgreSQL array[]::text[]
type with zero-to-many urls in the array. It looks like {www.blah.com, www.blah2.com}
.
What I'm tryin to accomplish from a trigger on the tweet
table is to create an entry in a urls_starting
table and then adding the tweet/url_starting relationship in a tweet_x_url_starting
.
Side note: The url_starting
table is linked to a url_ending
table where the fully resolved url paths reside.
The problem I face is deadlocks and I don't know what else to try.
I went on an Erwin Brandstetter learning spree. (if you're out there man... THANK YOU! πͺ)
- How to implement a many-to-many relationship in PostgreSQL?
- Deadlock with multi-row INSERTs despite ON CONFLICT DO NOTHING
- Postgres UPDATE ... LIMIT 1
(skip locked help)
I tried adding ORDER BY's for deterministic, stable orders and FOR UPDATE SKIP LOCKED but am not sure I'm doing any of it correctly.
Here's the structure. Using PostgreSQL 10.5.
CREATE TABLE tweet(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
twitter_id text NOT NULL,
created_at timestamp NOT NULL,
content text NOT NULL,
urls text[],
CONSTRAINT tweet_pk PRIMARY KEY (id)
);
CREATE TABLE url_starting(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
url text NOT NULL,
CONSTRAINT url_starting_pk PRIMARY KEY (id),
CONSTRAINT url_starting_ak_1 UNIQUE (url)
);
CREATE TABLE tweet_x_url_starting(
id_tweet integer NOT NULL,
id_url_starting integer NOT NULL,
CONSTRAINT tweet_x_url_starting_pk PRIMARY KEY (id_tweet,id_url_starting)
ALTER TABLE tweet_x_url_starting ADD CONSTRAINT tweet_fk FOREIGN KEY (id_tweet)
REFERENCES tweet (id) MATCH FULL
ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE tweet_x_url_starting ADD CONSTRAINT url_starting_fk FOREIGN KEY (id_url_starting)
REFERENCES url_starting (id) MATCH FULL
ON DELETE CASCADE ON UPDATE CASCADE;
Here's the tweet
table trigger.
CREATE TRIGGER create_tweet_relationships
AFTER INSERT OR UPDATE
ON tweet
FOR EACH ROW
EXECUTE PROCEDURE create_tweet_relationships();
And finally, the function.
CREATE FUNCTION create_tweet_relationships ()
RETURNS trigger
LANGUAGE plpgsql
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 1
AS $$
BEGIN
IF (NEW.urls IS NOT NULL) AND cardinality(NEW.urls::TEXT[]) > 0 THEN
WITH tmp_url AS (
INSERT INTO url_starting (url)
SELECT UNNEST(NEW.urls)
ORDER BY 1
ON CONFLICT (url) DO UPDATE
SET url = EXCLUDED.url
RETURNING id
)
INSERT INTO tweet_x_url_starting (id_tweet, id_url_starting)
SELECT NEW.id, id
FROM tmp_url
ORDER BY 1, 2
FOR UPDATE SKIP LOCKED
ON CONFLICT DO NOTHING;
END IF;
RETURN NULL;
END
$$;
I blindly threw the stuff I read about into the function without success.
The error looks like this.
deadlock detected
DETAIL: Process 11281 waits for ShareLock on transaction 1317; blocked by process 11278.
Process 11278 waits for ShareLock on transaction 1316; blocked by process 11281.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (494,33) in relation "url_starting"
SQL statement "WITH tmp_url AS (
INSERT INTO url_starting (url)
SELECT UNNEST(NEW.urls)
ORDER BY 1
ON CONFLICT (url) DO UPDATE
SET url = EXCLUDED.url
RETURNING id
)
INSERT INTO tweet_x_url_starting (id_tweet, id_url_starting)
SELECT NEW.id, id
FROM tmp_url
ORDER BY 1, 2
FOR UPDATE SKIP LOCKED
ON CONFLICT DO NOTHING"
PL/pgSQL function create_tweet_relationships() line 12 at SQL statement
Error causing transaction rollback (deadlocks, serialization failures, etc).
How can I stop the deadlocks? Thanks! π
2 Answers 2
This ultimately boiled down to 2 things.
Concurrent writes will eventually deadlock if the inserted data isn't sorted before inserting. Within my trigger function all inserts were sorted but there was no way to sort all the urls being added concurrently . The only way to solve this issue was to back up one level and do the inserting/sorting with the entire batch of tweets, thus having access to all urls at once.
More here. π How to use RETURNING with ON CONFLICT in PostgreSQL?
Doing this made a big difference but didn't completely fix the issue. π
The
ON CONFLICT
clause can prevent duplicate key errors but can't prevent concurrent transactions trying to enter the same keys.More here. π Deadlock with multi-row INSERTs despite ON CONFLICT DO NOTHING
As shown in the error message in my question there were conflicts in the system tuple index
ctid
when doingON CONFLICT (column) DO UPDATE
. Luckily I didn't need to update any data so there was no need for theDO UPDATE
part of my queries.Fixing this 100% stopped the deadlocks! π
Here π is the final query sent using python with the execute_values()
function in psycopg2
).
WITH cte_data (twitter_id, created_at, contents, search_hits, urls) AS (
VALUES
(NULL::text, NULL::timestamp, NULL::text, NULL::text[], NULL::text[]),
%s
OFFSET 1
)
, inserted_tweets AS (
INSERT INTO tweet (twitter_id, created_at, contents, search_hits)
SELECT twitter_id, created_at, contents, search_hits
FROM cte_data
ORDER BY 1
ON CONFLICT DO NOTHING
RETURNING id, twitter_id
)
, inserted_tweets_with_urls AS (
SELECT id, urls
FROM inserted_tweets
JOIN cte_data USING (twitter_id)
)
, unique_urls AS (
SELECT DISTINCT UNNEST(urls) url
FROM cte_data
)
, new_urls AS (
SELECT url
FROM url_starting
RIGHT JOIN unique_urls USING (url)
WHERE id IS NULL
)
, inserted_urls AS (
INSERT INTO url_starting (url)
TABLE new_urls
ORDER BY 1
ON CONFLICT DO NOTHING
RETURNING id, url
)
INSERT INTO tweet_x_url_starting (id_tweet, id_url_starting)
SELECT it.id, iu.id
FROM inserted_tweets_with_urls it
JOIN inserted_urls iu
ON (iu.url = ANY (it.urls))
ORDER BY 1, 2
ON CONFLICT DO NOTHING;
This might require careful examination, knowledge of the complete situation and more time than I can spend now. Or maybe I am missing something obvious. Whatever else might go wrong here, some things stand out:
Remove the
FOR UPDATE SKIP LOCKED
completely. It makes no sense where you use it. While selecting from the CTE, which holds rows with exclusive lock already, that makes no sense. It also would not make sense to skip any rows at this stage of the query.COST 1
is misleading. The default isCOST 100
and your trigger function is more in the realm of COST 5000. Leave the default or set it higher. Probably unrelated to the deadlock.An
AFTER
trigger might be more susceptible to deadlocks than rewriting the whole workflow with a single query (with multiple data-modifying CTEs).My shot in the dark: the deadlock is caused by the FK constraint trying to take a
ShareLock
onurl_starting
while concurrent transactions try to modify the same row after having taken a similarShareLock
themselves and vice versa. A quick and dirty solution might be to drop the FK constrainttweet_x_url_starting
if you can afford that. You could at least try that to verify it's part of the problem.
If you want to continue your learning spree - here is one more that seems relevant:
-
1Hi Erwin. Thanks for your response and thanks for all the well organized and articulated answer across SE! π I spend the last couple days troubleshooting and implementing all your suggestions here and in the linked answers. It ultimately led me to simplifying the situation. But I still have a similar issue. Maybe it will be able to troubleshoot?GollyJer– GollyJer2018εΉ΄12ζ01ζ₯ 23:59:55 +00:00Commented Dec 1, 2018 at 23:59
-
1For those following along. This is upvoted but not marked as the answer because we continued the conversation in a simplified question leading to a more actionable answer. π Check it out here. πGollyJer– GollyJer2018εΉ΄12ζ03ζ₯ 09:48:53 +00:00Commented Dec 3, 2018 at 9:48
Explore related questions
See similar questions with these tags.