I have a query that's behaving a bit oddly. In my database I have a table called "records". It tells me a bunch of information about what applications a user ran on my company's machines. I'm trying to aggregate some statistics, but am having some odd issues with a query.
This query runs in about 6.5 minutes (~30 million entries in "records"). I would expect it to take longer when divisionName isn't specified, but it seems to be taking an unreasonable amount of time to finish (overnight and still chugging).
select divisionName, programName, count(usageID)
from records R
right join Programs P
on P.programID=R.usageProgramID
right join locate L
on L.computerID=R.usageComputerID
where divisionName="umbrella"
group by programName
order by programName asc
INTO OUTFILE '/tmp/lab_prog_umbrella.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
Is there an alternate structure to speed up the query? I have an index on (computerID,divisionName) in locate and (programID,programName) in Programs as well as a multitude of indexes in records.
Note: Programs contains 4 fields and locate contains 2. I don't think the joins are exceptionally large.
Edit:
Explain:
+----+-------------+-------+------+-----------------+-----------+---------+----------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+-----------+---------+----------------------+------+----------------------------------------------+
| 1 | SIMPLE | L | ref | loc | loc | 27 | const | 1195 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | R | ref | uprog,computers | computers | 34 | scf.L.computerID | 1627 | |
| 1 | SIMPLE | P | ref | pid_name | pid_name | 43 | scf.R.usageProgramID | 1 | Using index |
+----+-------------+-------+------+-----------------+-----------+---------+----------------------+------+----------------------------------------------+
Records Description:
+-----------------+-------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------------------+-------+
| usageID | varchar(24) | NO | PRI | NULL | |
| usageWhen | datetime | NO | PRI | 0000-00-00 00:00:00 | |
| usageEnum | int(11) | YES | | NULL | |
| usageServerID | int(11) | YES | | NULL | |
| usageServerType | int(11) | YES | | NULL | |
| usageProgramID | varchar(40) | NO | PRI | | |
| usageLicenseID | varchar(18) | YES | | NULL | |
| usageComputerID | varchar(31) | YES | MUL | NULL | |
| usageExpansion | varchar(0) | YES | | NULL | |
| usageUser | varchar(31) | YES | MUL | NULL | |
| usageAddress | varchar(28) | YES | | NULL | |
| usageGroup | varchar(16) | YES | | NULL | |
| usageEvent | int(11) | YES | | NULL | |
| usageReason | int(11) | YES | | NULL | |
| usageTime | int(11) | YES | | NULL | |
| usageOtherTime | varchar(25) | YES | | NULL | |
| usageGMTOffset | int(11) | YES | | NULL | |
| usageCount | int(11) | YES | | NULL | |
+-----------------+-------------+------+-----+---------------------+-------+
Locate Description:
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| computerID | varchar(31) | YES | MUL | NULL | |
| divisionName | varchar(24) | YES | MUL | NULL | |
+--------------+-------------+------+-----+---------+-------+
Programs Description:
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| programID | varchar(40) | YES | MUL | NULL | |
| programName | varchar(63) | YES | MUL | NULL | |
| programVersion | varchar(31) | YES | | NULL | |
| category | varchar(30) | YES | | NULL | |
+----------------+-------------+------+-----+---------+-------+
1 Answer 1
- Create foreign keys from
RECORDS
toPROGRAMS
ansLOCATE
( you don't mention if they exist ). - Use
LEFT JOIN
instead ofRIGHT JOIN
. After allRECORDS
is the "strong" table in this query. - Group by
R.usageProgramID
instead ofby ProgramName
.
select divisionName, programName, count(usageID)
from records R
left join Programs P
on P.programID=R.usageProgramID
left join locate L
on L.computerID=R.usageComputerID
where divisionName="umbrella"
group by R.usageProgramID
order by programName asc
Another alternative is to try this:
select
t.divisionName, P.programName, count(*) as total
from (
select L.divisionName, R.usageComputerID
from records R
left join locate L
on L.computerID=R.usageComputerID
where L.divisionName="umbrella"
) t
left join Programs P
on P.programID=t.usageProgramID
group by
group by P.programName
order by P.programName asc
Since the absence of FK maybe not helping.
-
Not possible. The tables aren't setup with support for foreign keys so that's out. And each program has dozens of id's. For some reason it was setup in a way that gives each version of an application a unique ID. Chrome for example has a few hundred...Jacobm001– Jacobm0012013年08月14日 16:26:32 +00:00Commented Aug 14, 2013 at 16:26
-
@Jacobm001 Will not different versions of an application have unique program names anyway ? Also, are R.usageProgramID and R.usageComputerID indexed ?Tulains Córdova– Tulains Córdova2013年08月14日 16:30:12 +00:00Commented Aug 14, 2013 at 16:30
-
No, the programNames are condensed and are not unique. There is programVersion in the Program table that correlates what version it's at. Yes the two fields are indexed.Jacobm001– Jacobm0012013年08月14日 16:32:47 +00:00Commented Aug 14, 2013 at 16:32
-
@Jacobm001 Try the left join. The right join is doing things the other way around.Tulains Córdova– Tulains Córdova2013年08月14日 16:35:10 +00:00Commented Aug 14, 2013 at 16:35
-
Will do. I'll let you know how it goes.Jacobm001– Jacobm0012013年08月14日 16:36:47 +00:00Commented Aug 14, 2013 at 16:36
Explore related questions
See similar questions with these tags.
computerID
. This is discussed on SO: stackoverflow.com/questions/179085/…. What doesEXPLAIN PLAN
show you about this query?divisionName
is part oflocate
andprogramName
is a field ofPrograms
is needless friction.locate
is a temp table? Is populating it part of the long run time, or just the query you're showing? Diagnosing a problem script is much like any other technical problem: strip out elements until you've isolated the specific problem. With that in mind, what doesEXPLAIN PLAN
say? Does it run OK if you just includePrograms
andlocate
? If so, you might want to aggregaterecords
and then join to the remaining two tables.