The following query lists all domains from multiple companies in which a value occurs in an XML set. This is filtered to the most recent occurrence within a provided date range.
On large databases the query has performance problems. Is there a more efficient method to retrieve the most recent query results within the provided date range?
SELECT Company.Name,
Domain.Name,
Collection.CollectedOn
FROM Collection
INNER JOIN Domain ON Domain.DomainID = Collection.DomainID
INNER JOIN Company ON Domain.CompanyID = Company.CompanyID
WHERE CollectedOn = (SELECT MAX(CollectedOn)
FROM Collection
INNER JOIN QResult ON QResult.CollectionID = Collection.CollectionID
INNER JOIN QVersion ON QVersion.QVersionID = QResult.QVersionID
INNER JOIN QLibrary ON QLibrary.QueryID = QVersion.QueryID
WHERE Collection.DomainID = Domain.DomainID
AND CollectedOn >= :CollectedAfter
AND CollectedOn <= :CollectedBefore
AND QLibrary.Name LIKE 'Some_Table_Name_That_Only_Occurs_Once_or_Twice'
AND QResult.QResults.exist('/DataSet/some_table/name/text()[contains(., ''Error_value_here'')]') = 1)
If having a visual representation of the tables would be helpful I can post requested tables.
Explanation and information:
- A company can have multiple domains.
- A domain can have multiple collections. For the purpose of this query, collections are essentially a grouping of previously run queries executed on external (other companies) machines.
- There are around 100 companies. Each company generally has 1-10 Domains. There are hundreds of collections for each domain.
QLibrary
is a small table. The name I'm searching for with theLIKE
param only occurs a few times. Some of the tables I'm retrieving have additional characters concatenated to the searched for name. Thus usingLIKE
instead of strict equality comparison.- The
QResult
is a large set of XML. I'm not sure how expensive this is to search in an SQL statement. If this could significantly contribute to a performance problem, I can search through this in a different language after the query is returned. I've tried to restrict this based onQLibrary.Name
. Most of the tables searched in this query should be around 5MB.
Tables (Edit)
Company Table:
╔═══════════╦═══════╗
║ CompanyID ║ Name ║
╠═══════════╬═══════╣
║ 1 ║ name1 ║
║ 2 ║ name2 ║
╚═══════════╩═══════╝
Domain Table:
╔══════════╦═══════════╦═══════╗
║ DomainID ║ CompanyID ║ Name ║
╠══════════╬═══════════╬═══════╣
║ 3 ║ 1 ║ name1 ║
║ 4 ║ 1 ║ name2 ║
║ 6 ║ 4 ║ name3 ║
╚══════════╩═══════════╩═══════╝
Collection Table:
╔══════════════╦══════════╦═════════════╗
║ CollectionID ║ DomainID ║ CollectedOn ║
╠══════════════╬══════════╬═════════════╣
║ 1 ║ 1 ║ 2016年01月03日 ║
║ 2 ║ 1 ║ 2016年05月11日 ║
║ 3 ║ 2 ║ 2015年09月04日 ║
╚══════════════╩══════════╩═════════════╝
QResult Table:
╔════════════╦══════════════╦══════════════════╗
║ QVersionID ║ CollectionID ║ QResult ║
╠════════════╬══════════════╬══════════════════╣
║ 1 ║ 1 ║ <bunch><of><xml> ║
║ 2 ║ 1 ║ <other><xml> ║
╚════════════╩══════════════╩══════════════════╝
QVersion Table:
╔════════════╦═════════╗
║ QVersionID ║ QueryID ║
╠════════════╬═════════╣
║ 1 ║ 1 ║
║ 2 ║ 2 ║
║ 3 ║ 2 ║
╚════════════╩═════════╝
QLibrary Table:
╔═════════╦═════════════╗
║ QueryID ║ QueryName ║
╠═════════╬═════════════╣
║ 1 ║ some_name_1 ║
║ 2 ║ some_name_2 ║
╚═════════╩═════════════╝
-
\$\begingroup\$ Please do post more details. It's hard to review SQL without knowing the relevant table schemas, indexes, uniqueness constraints. A few rows of sample data and expected output would also make the situation easier to understand. \$\endgroup\$200_success– 200_success2016年11月21日 01:14:43 +00:00Commented Nov 21, 2016 at 1:14
-
\$\begingroup\$ @200_success I added the tables. I'm primarily looking for feedback on whether the there is a better way (in terms of performance) or an alternative to executing the subquery. I've changed the title back to better reflect my question. \$\endgroup\$Speerian– Speerian2016年11月21日 01:54:41 +00:00Commented Nov 21, 2016 at 1:54
1 Answer 1
2 suggestions:
- do you really need a LIKE, can't you use an equality?
- can't you try to first filter the XML and then (but still in the same query) going up with the needed joins to retrieve what you need ?
Also your performances obviously depend on the size of various tables and on the available indexes. Your RDBM should be able to let you know which indexes it will use for your above query and based on that you may find you are missing some. Filtering on dates, xml, and LIKE patterns has a cost.