1

I have a Django website with a postgresql 9.3.10 backend. The website contains dynamic user-generated content (think of it as a form of 9gag). There are some tables from which I routinely delete aged rows. The database is called dbname and I log into it via the user postgres. The following are two examples of such routine deletion:

begin;
DELETE FROM links_groupseen WHERE which_reply_id IN (SELECT id FROM links_reply where "submitted_on" < now() - interval '7 days');
DELETE FROM links_reply WHERE "submitted_on" < now() - interval '7 days';
commit;
begin;
DELETE FROM links_vote WHERE link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_photoobjectsubscription WHERE which_link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour';
commit;

There are ~10 other such operations I run. At the moment, I do everything manually - I want to automate these to run at a certain low-utlization time of day. How do I do that? Can someone give me an illustrative example?


As per my own rudimentary knowledge, I'm thinking I should first save the following shell script to /etc/cron.daily:

#!/bin/sh
dbname="dbname"
username="postgres"
psql $dbname $username << EOF
begin;
DELETE FROM links_groupseen WHERE which_reply_id IN (SELECT id FROM links_reply where "submitted_on" < now() - interval '7 days');
DELETE FROM links_reply WHERE "submitted_on" < now() - interval '7 days';
commit;
begin;
DELETE FROM links_vote WHERE link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_photoobjectsubscription WHERE which_link_id IN (SELECT id FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour');
DELETE FROM links_link WHERE id NOT IN (select answer_to_id from links_publicreply) AND "submitted_on" < now() - interval '1 hour';
commit;
EOF

And then add the said script to crontab like so:

30 4 * * * /etc/cron.daily/mypgscript

And then service crond restart

asked Jun 24, 2016 at 9:39

2 Answers 2

1

Back in 2013 a similar question was asked (How to run recurring tasks on a Postgresql database without a cron-like tool?). The crontab or pgAgent were the only way to go.

For version 9.3 there is already the Background Worker Processes. But even in the version 9.5.2 documentation there is a warning about using it.

I think that the cron solution is a good way to go but check the access to the database. Limit the access of the user to one server only because otherwise anybody can use this user from any place to do the deletion and they might not delete only the older ones.

answered Jun 24, 2016 at 10:07
2
  • Btw, any idea where I'd get to view the log file containing the output of the cron script? Commented Jun 24, 2016 at 10:50
  • Normally the output is mailed to the user that runs the cron. This was answered in unix.stackexchange.com/questions/207/…. Commented Jun 25, 2016 at 15:53
2

A method that I find useful is to implement the cleanup code in my main programming framework (so Django, I guess, in your case) and expose that function via a URL. Then in my crontab I use wget or curl to invoke the cleanup URL. That way the cron file is only responsible for the scheduling part, while the code that does the cleanup is kept together with the code that creates the data that needs to be cleaned up.

answered Jun 28, 2016 at 17:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.