1
\$\begingroup\$

The following query lists all domains from multiple companies in which a value occurs in an XML set. This is filtered to the most recent occurrence within a provided date range.

On large databases the query has performance problems. Is there a more efficient method to retrieve the most recent query results within the provided date range?

SELECT Company.Name,
 Domain.Name,
 Collection.CollectedOn
FROM Collection
INNER JOIN Domain ON Domain.DomainID = Collection.DomainID
INNER JOIN Company ON Domain.CompanyID = Company.CompanyID
WHERE CollectedOn = (SELECT MAX(CollectedOn) 
 FROM Collection 
 INNER JOIN QResult ON QResult.CollectionID = Collection.CollectionID
 INNER JOIN QVersion ON QVersion.QVersionID = QResult.QVersionID
 INNER JOIN QLibrary ON QLibrary.QueryID = QVersion.QueryID
 WHERE Collection.DomainID = Domain.DomainID
 AND CollectedOn >= :CollectedAfter
 AND CollectedOn <= :CollectedBefore
 AND QLibrary.Name LIKE 'Some_Table_Name_That_Only_Occurs_Once_or_Twice'
 AND QResult.QResults.exist('/DataSet/some_table/name/text()[contains(., ''Error_value_here'')]') = 1)

If having a visual representation of the tables would be helpful I can post requested tables.

Explanation and information:

  • A company can have multiple domains.
  • A domain can have multiple collections. For the purpose of this query, collections are essentially a grouping of previously run queries executed on external (other companies) machines.
  • There are around 100 companies. Each company generally has 1-10 Domains. There are hundreds of collections for each domain.
  • QLibrary is a small table. The name I'm searching for with the LIKE param only occurs a few times. Some of the tables I'm retrieving have additional characters concatenated to the searched for name. Thus using LIKE instead of strict equality comparison.
  • The QResult is a large set of XML. I'm not sure how expensive this is to search in an SQL statement. If this could significantly contribute to a performance problem, I can search through this in a different language after the query is returned. I've tried to restrict this based on QLibrary.Name. Most of the tables searched in this query should be around 5MB.

Tables (Edit)

Company Table:
╔═══════════╦═══════╗
║ CompanyID ║ Name ║
╠═══════════╬═══════╣
║ 1 ║ name1 ║
║ 2 ║ name2 ║
╚═══════════╩═══════╝
Domain Table:
╔══════════╦═══════════╦═══════╗
║ DomainID ║ CompanyID ║ Name ║
╠══════════╬═══════════╬═══════╣
║ 3 ║ 1 ║ name1 ║
║ 4 ║ 1 ║ name2 ║
║ 6 ║ 4 ║ name3 ║
╚══════════╩═══════════╩═══════╝
Collection Table:
╔══════════════╦══════════╦═════════════╗
║ CollectionID ║ DomainID ║ CollectedOn ║
╠══════════════╬══════════╬═════════════╣
║ 1 ║ 1 ║ 2016年01月03日 ║
║ 2 ║ 1 ║ 2016年05月11日 ║
║ 3 ║ 2 ║ 2015年09月04日 ║
╚══════════════╩══════════╩═════════════╝
QResult Table:
╔════════════╦══════════════╦══════════════════╗
║ QVersionID ║ CollectionID ║ QResult ║
╠════════════╬══════════════╬══════════════════╣
║ 1 ║ 1 ║ <bunch><of><xml> ║
║ 2 ║ 1 ║ <other><xml> ║
╚════════════╩══════════════╩══════════════════╝
QVersion Table:
╔════════════╦═════════╗
║ QVersionID ║ QueryID ║
╠════════════╬═════════╣
║ 1 ║ 1 ║
║ 2 ║ 2 ║
║ 3 ║ 2 ║
╚════════════╩═════════╝
QLibrary Table:
╔═════════╦═════════════╗
║ QueryID ║ QueryName ║
╠═════════╬═════════════╣
║ 1 ║ some_name_1 ║
║ 2 ║ some_name_2 ║
╚═════════╩═════════════╝
asked Nov 20, 2016 at 19:36
\$\endgroup\$
2
  • \$\begingroup\$ Please do post more details. It's hard to review SQL without knowing the relevant table schemas, indexes, uniqueness constraints. A few rows of sample data and expected output would also make the situation easier to understand. \$\endgroup\$ Commented Nov 21, 2016 at 1:14
  • \$\begingroup\$ @200_success I added the tables. I'm primarily looking for feedback on whether the there is a better way (in terms of performance) or an alternative to executing the subquery. I've changed the title back to better reflect my question. \$\endgroup\$ Commented Nov 21, 2016 at 1:54

1 Answer 1

1
\$\begingroup\$

2 suggestions:

  • do you really need a LIKE, can't you use an equality?
  • can't you try to first filter the XML and then (but still in the same query) going up with the needed joins to retrieve what you need ?

Also your performances obviously depend on the size of various tables and on the available indexes. Your RDBM should be able to let you know which indexes it will use for your above query and based on that you may find you are missing some. Filtering on dates, xml, and LIKE patterns has a cost.

answered Jan 29, 2017 at 18:17
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.