I'm writing a python script and do database query to get the ids of employees who are in one table but are not in another. I'm using psycopg2 module for python 2 and PostgreSQL as database. After querying from one table based on a condition, I do another query on another table to get the difference between these tables using the result of previous query. The problem is the full procedure takes a long time. I want to know is there any other method or technique which can make the entire procedure faster? Below is the code I used for doing my feature:
def find_difference_assignment_pls_count(self):
counter = 0
emp_ids = []
self.test_cursor.execute("""Select id,emp_id from test_test where flag=true and emp_id is not null and ver_id in(select id from test_phases where state='test')""")
matching_records = self.test_cursor.fetchall()
for match_record in matching_records:
self.test_cursor.execute("""Select id from test_table where test_emp_id=%s and state='test_state'""",(match_record['emp_id'],))
result = self.test_cursor.fetchall()
if result:
continue
else:
emp_ids.append(match_record['emp_id'])
counter +=1
print "Employees of test_test not in test_table: ", counter
return emp_ids
I run these queries on two tables which at least have more than 500000 records. Therefore the performance is really slow.
1 Answer 1
LEFT JOIN from test_test
to test_table
selecting only rows where test_table.test_emp_id
is null
.
select emp_id
from test_test
left join test_table on (
test_table.test_emp_id = test_test.emp_id and
test_table.state = 'test_state'
)
where (
test_test.flag = true and
test_test.emp_id is not null and
test_test.ver_id in (
select id
from test_phases
where state = 'test'
) and
test_table.test_emp_id is null
)
You may also want to consider:
Using an inner join instead of a subquery to select only rows with
test_phases.state = 'test'
.Selecting
distinct
emp_id
s iftest_test.emp_id
does not have a unique constraint.
select distinct emp_id
from test_test
inner join test_phases on test_phases.id = test_test.ver_id
left join test_table on (
test_table.test_emp_id = test_test.emp_id and
test_table.state = 'test_state'
)
where (
test_test.flag = true and
test_test.emp_id is not null and
test_phases.state = 'test' and
test_table.test_emp_id is null
)