0

Hello people, I'm using a log daemon (https://github.com/paranormal/blooper) in Squid Proxy to put access log into PostreSQL and I make a Trigger Function:

DECLARE
 newtime varchar := EXTRACT (MONTH FROM NEW."time")::varchar;
 newyear varchar := EXTRACT (YEAR FROM NEW."time")::varchar;
 user_name varchar := REPLACE (NEW.user_name, '.', '_');
 partname varchar := newtime || '_' || newyear;
 tablename varchar := user_name || '.accesses_' || partname;
BEGIN
 IF NEW.user_name IS NOT NULL THEN
 EXECUTE 'CREATE SCHEMA IF NOT EXISTS ' || user_name;
 EXECUTE 'CREATE TABLE IF NOT EXISTS '
 || tablename
 || '('
 || 'CHECK (user_name = ''' || NEW.user_name || ''' AND EXTRACT(MONTH FROM "time") = ' || newtime || ' AND EXTRACT (YEAR FROM "time") = ' || newyear || ')'
 || ') INHERITS (public.accesses)';
 EXECUTE 'CREATE INDEX IF NOT EXISTS access_index_' || partname || '_user_name ON ' || tablename || ' (user_name)';
 EXECUTE 'CREATE INDEX IF NOT EXISTS access_index_' || partname || '_time ON ' || tablename || ' ("time")';
 EXECUTE 'INSERT INTO ' || tablename || ' SELECT 1ドル.*' USING NEW;
 END IF;
 RETURN NULL;
END;

The main function of it is make a table partition by user_name and by month-year of the access, inhering from a master clean table:

CREATE TABLE public.accesses
(
 id integer NOT NULL DEFAULT nextval('accesses_id_seq'::regclass),
 "time" timestamp with time zone NOT NULL,
 time_response integer,
 mac_source macaddr,
 ip_source inet NOT NULL,
 ip_destination inet,
 user_name character varying(40),
 http_status_code numeric(3,0) NOT NULL,
 http_reply_size bigint NOT NULL,
 http_request_method character varying(15) NOT NULL,
 http_request_url character varying(4166) NOT NULL,
 http_content_type character varying(100),
 squid_hier_code character varying(20),
 squid_request_status character varying(50),
 user_id integer,
 CONSTRAINT accesses_http_request_method_fkey FOREIGN KEY (http_request_method)
 REFERENCES public.http_requests (method) MATCH SIMPLE
 ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT accesses_http_status_code_fkey FOREIGN KEY (http_status_code)
 REFERENCES public.http_statuses (code) MATCH SIMPLE
 ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT accesses_user_id_fkey FOREIGN KEY (user_id)
 REFERENCES public.users (id) MATCH SIMPLE
 ON UPDATE NO ACTION ON DELETE NO ACTION
)

The main problem is get the sum of http_reply_size grouping by user_name and time, my query is:

SELECT
 "time",
 user_name,
 sum(http_reply_size)
FROM
 accesses
WHERE
 extract(epoch from "time") BETWEEN 1516975122 AND 1516996722
GROUP BY
 "time",
 user_name

But this query is very slow in the server (3'237'976 rows currently in 2 days only). So, PostgreSQL has something to optimize a query with that need, or I need to use another SQL or NoSQL system.

asked Jan 28, 2018 at 2:24

1 Answer 1

0

You are representing time in 3 different was. The query is using epoch, while the index is using timestamptz, and the partition constraints are using year and month as two separate fields. So the query can make use of neither the index nor the partition constraints.

You should probably change them all to use timestamptz. In the query, instead of converting timestamptz to epoch on the column "time", convert epoch to timestamptz for the BETWEEN constants. (Or just have the client send timestamptz rather than epoch in the first place)

For the check constraint, you could use date_trunc("month",NEW."time") and date_trunc("month",NEW."time" + "1 month") to arrive at the endpoints to put into the check constraint. You would want to spell out the check constraint something like "time" >= low_limit and "time" < high_limit rather than using BETWEEN.

answered Jan 29, 2018 at 15:28
9
  • I was created a new DB with your suggestion and when i put some data (126788 rows) without ANALYSE and then postgresql use all the indexes, but when I run ANALYSE now run sequencial scan only, ufff Commented Jan 30, 2018 at 16:28
  • You'd have to give us the "EXPLAIN ANALYZE" on the query both ways (before ANALYZE and after ANALYZE) for us to help with that. Probably best as a separate question. Commented Jan 30, 2018 at 18:14
  • i have them but are large, how I can send you Commented Jan 30, 2018 at 19:25
  • I think the best way is to put them here explain.depesz.com then post a link to it Commented Jan 31, 2018 at 1:57
  • thanks, this is before ANALYZE explain.depesz.com/s/Nvux Commented Jan 31, 2018 at 15:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.