I have legacy tables similar to the following:
employee
------------------------------------
| employee_id | name
------------------------------------
| 1 | David
| 2 | Mathew
------------------------------------
payroll
-------------------------------------
| employee_id | salary
-------------------------------------
| 2 | 200000
| 3 | 90000
-------------------------------------
I want to get the following data, after joins and filters:
-----------------------------------------------------------
| address_id | employee_id | address
-----------------------------------------------------------
| 1 | 2 | street 1, NY
| 2 | 2 | street 2, DC
------------------------------------------------------------
I have the following query:
SELECT employee_id, salary, address_arr
FROM employee
LEFT JOIN payroll on payroll.employee_id = employee.employee_id
INNER JOIN
(
SELECT employee_id, ARRAY_AGG(address) as address_arr
FROM addresses
GROUP BY employee_id
) table_address ON table_address.employee_id = employee.employee_id
WHERE employee.employee_id < 1000000
LIMIT 100
OFFSET 0
Above query gives the desired output but is highly unoptimized as the GROUP BY
operation occurs over the complete addresses table before being used for JOIN
operation the outer query.
Kindly answer:
- How can we avoid the
GROUP BY
operation to occurs over the complete addresses table by usingLIMIT OFFSET
of the outer query? - Will the condition
WHERE employee.employee_id < 1000000
be applied on subquery before or after theGROUP BY
operation in the inner query. If the condition is applied after theGROUP BY
, how can we avoid that?
Note: There are multiple JOIN
s and subqueries in the actual query being used.
-
PS. LIMIT without ORDER BY gives you 100 random records from the whole data array... do you really need in that?Akina– Akina2019年02月08日 07:44:08 +00:00Commented Feb 8, 2019 at 7:44
-
You should present us the simplest query you can which still have the issue. If you remove the left join on PAYROLL, does the problem go away?jjanes– jjanes2019年02月09日 15:18:38 +00:00Commented Feb 9, 2019 at 15:18
3 Answers 3
I am not sure if this is really more efficient, but you could try to join to a derived table that applies the limit.
select emp.employee_id, emp.salary, adr.address_arr
from (
SELECT employee_id, salary, address_arr
FROM employee
LEFT JOIN payroll on payroll.employee_id = employee.employee_id
WHERE employee.employee_id < 1000000
LIMIT 100
OFFSET 0
) as emp
JOIN (
SELECT a.employee_id, ARRAY_AGG(a.address) as address_arr
FROM addresses a
GROUP BY employee_id
) as adr ON adr.employee_id = emp.employee_id;
The first derived table only selects 100 rows, and the join/group by should then only be done for those 100 employees.
If the optimizer doesn't push that down, you could try a lateral join instead to "force" a push down:
select emp.employee_id, emp.salary, adr.address_arr
from (
SELECT employee_id, salary, address_arr
FROM employee
LEFT JOIN payroll on payroll.employee_id = employee.employee_id
WHERE employee.employee_id < 1000000
LIMIT 100
OFFSET 0
) as emp
LATERAL JOIN (
SELECT a.employee_id, ARRAY_AGG(a.address) as address_arr
FROM addresses a
WHERE a.employee_id = emp.employee_id
GROUP BY employee_id
) as adr ON adr.employee_id = emp.employee_id;
The join condition isn't really needed, but it dosn't hurt either
Maybe
SELECT employee.employee_id, payroll.salary, ARRAY_AGG(addresses.address)
FROM employee
INNER JOIN addresses ON addresses.employee_id = employee.employee_id
LEFT JOIN payroll on payroll.employee_id = employee.employee_id
WHERE employee.employee_id < 1000000
GROUP BY employee.employee_id
LIMIT 100
OFFSET 0
?
And - do you really need in records where no appropriate records in payroll
table which leads to NULLs in payroll.salary
? Maybe, INNER JOIN is enough?
-
The query is incorrect. payroll.salary needs to be included in GROUP BY clause, which will unoptimise the query furthernimeshkiranverma– nimeshkiranverma2019年02月08日 07:48:25 +00:00Commented Feb 8, 2019 at 7:48
-
@nimeshkiranverma payroll.salary needs to be included in GROUP BY clause you may wrap it using any aggregate function which can be applied to this field datatype.Akina– Akina2019年02月08日 07:50:52 +00:00Commented Feb 8, 2019 at 7:50
I am new to this and had some help , here's what I came up with:
with t as (SELECT employee.employee_id, salary FROM employee LEFT JOIN payroll on payroll.employee_id = employee.employee_id WHERE employee.employee_id < 1000000 LIMIT 100 OFFSET 0)
select t.employee_id, max(t.salary), ARRAY_AGG(address) as address_arr from address left join t on address.employee_id = t.employee_id where address.employee_id = t.employee_id group by t.employee_id;
explain analyze yields
HashAggregate (cost=443931.83..443933.08 rows=100 width=44) (actual time=3173.705..3173.705 rows=1 loops=1)
Group Key: t.employee_id
CTE t
-> Limit (cost=313.99..316.90 rows=100 width=12) (actual time=7.441..7.511 rows=100 loops=1)
-> Hash Right Join (cost=313.99..145616.13 rows=4983381 width=12) (actual time=7.437..7.490 rows=100 loops=1)
Hash Cond: (payroll.employee_id = employee.employee_id)
-> Seq Scan on payroll (cost=0.00..76778.79 rows=4983879 width=12) (actual time=0.036..0.047 rows=100 loops=1)
-> Hash (cost=189.00..189.00 rows=9999 width=4) (actual time=7.370..7.370 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 480kB
-> Seq Scan on employee (cost=0.00..189.00 rows=9999 width=4) (actual time=0.016..3.783 rows=10000 loops=1)
Filter: (employee_id < 1000000)
-> Hash Join (cost=3.25..441981.29 rows=217817 width=37) (actual time=7.699..3119.096 rows=200000 loops=1)
Hash Cond: (address.employee_id = t.employee_id)
-> Seq Scan on address (cost=0.00..364792.00 rows=20002100 width=29) (actual time=0.067..1568.256 rows=20000000 loops=1)
-> Hash (cost=2.00..2.00 rows=100 width=12) (actual time=7.609..7.609 rows=100 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> CTE Scan on t (cost=0.00..2.00 rows=100 width=12) (actual time=7.449..7.580 rows=100 loops=1)
Planning time: 0.456 ms
Execution time: 3174.499 ms
(19 rows)
-
1Could you give us the explain output in plain text please ?Arkhena– Arkhena2019年02月22日 07:37:24 +00:00Commented Feb 22, 2019 at 7:37
Explore related questions
See similar questions with these tags.