Optimize subquery that searches xml

Question 1

The following query lists all domains from multiple companies in which a value occurs in an XML set. This is filtered to the most recent occurrence within a provided date range.

On large databases the query has performance problems. Is there a more efficient method to retrieve the most recent query results within the provided date range?

SELECT Company.Name,
 Domain.Name,
 Collection.CollectedOn
FROM Collection
INNER JOIN Domain ON Domain.DomainID = Collection.DomainID
INNER JOIN Company ON Domain.CompanyID = Company.CompanyID
WHERE CollectedOn = (SELECT MAX(CollectedOn) 
 FROM Collection 
 INNER JOIN QResult ON QResult.CollectionID = Collection.CollectionID
 INNER JOIN QVersion ON QVersion.QVersionID = QResult.QVersionID
 INNER JOIN QLibrary ON QLibrary.QueryID = QVersion.QueryID
 WHERE Collection.DomainID = Domain.DomainID
 AND CollectedOn >= :CollectedAfter
 AND CollectedOn <= :CollectedBefore
 AND QLibrary.Name LIKE 'Some_Table_Name_That_Only_Occurs_Once_or_Twice'
 AND QResult.QResults.exist('/DataSet/some_table/name/text()[contains(., ''Error_value_here'')]') = 1)

If having a visual representation of the tables would be helpful I can post requested tables.

Explanation and information:

A company can have multiple domains.
A domain can have multiple collections. For the purpose of this query, collections are essentially a grouping of previously run queries executed on external (other companies) machines.
There are around 100 companies. Each company generally has 1-10 Domains. There are hundreds of collections for each domain.
QLibrary is a small table. The name I'm searching for with the LIKE param only occurs a few times. Some of the tables I'm retrieving have additional characters concatenated to the searched for name. Thus using LIKE instead of strict equality comparison.
The QResult is a large set of XML. I'm not sure how expensive this is to search in an SQL statement. If this could significantly contribute to a performance problem, I can search through this in a different language after the query is returned. I've tried to restrict this based on QLibrary.Name. Most of the tables searched in this query should be around 5MB.

Tables (Edit)

Company Table:
╔═══════════╦═══════╗
║ CompanyID ║ Name ║
╠═══════════╬═══════╣
║ 1 ║ name1 ║
║ 2 ║ name2 ║
╚═══════════╩═══════╝
Domain Table:
╔══════════╦═══════════╦═══════╗
║ DomainID ║ CompanyID ║ Name ║
╠══════════╬═══════════╬═══════╣
║ 3 ║ 1 ║ name1 ║
║ 4 ║ 1 ║ name2 ║
║ 6 ║ 4 ║ name3 ║
╚══════════╩═══════════╩═══════╝
Collection Table:
╔══════════════╦══════════╦═════════════╗
║ CollectionID ║ DomainID ║ CollectedOn ║
╠══════════════╬══════════╬═════════════╣
║ 1 ║ 1 ║ 2016年01月03日 ║
║ 2 ║ 1 ║ 2016年05月11日 ║
║ 3 ║ 2 ║ 2015年09月04日 ║
╚══════════════╩══════════╩═════════════╝
QResult Table:
╔════════════╦══════════════╦══════════════════╗
║ QVersionID ║ CollectionID ║ QResult ║
╠════════════╬══════════════╬══════════════════╣
║ 1 ║ 1 ║ <bunch><of><xml> ║
║ 2 ║ 1 ║ <other><xml> ║
╚════════════╩══════════════╩══════════════════╝
QVersion Table:
╔════════════╦═════════╗
║ QVersionID ║ QueryID ║
╠════════════╬═════════╣
║ 1 ║ 1 ║
║ 2 ║ 2 ║
║ 3 ║ 2 ║
╚════════════╩═════════╝
QLibrary Table:
╔═════════╦═════════════╗
║ QueryID ║ QueryName ║
╠═════════╬═════════════╣
║ 1 ║ some_name_1 ║
║ 2 ║ some_name_2 ║
╚═════════╩═════════════╝

Question 2

Please do post more details. It's hard to review SQL without knowing the relevant table schemas, indexes, uniqueness constraints. A few rows of sample data and expected output would also make the situation easier to understand.

Question 3

@200_success I added the tables. I'm primarily looking for feedback on whether the there is a better way (in terms of performance) or an alternative to executing the subquery. I've changed the title back to better reflect my question.

Question 4

2 suggestions:

do you really need a LIKE, can't you use an equality?
can't you try to first filter the XML and then (but still in the same query) going up with the needed joins to retrieve what you need ?

Also your performances obviously depend on the size of various tables and on the available indexes. Your RDBM should be able to let you know which indexes it will use for your above query and based on that you may find you are missing some. Filtering on dates, xml, and LIKE patterns has a cost.

Patrick Mevzek Patrick Mevzek 2221 gold badge4 silver badges13 bronze badges · Answer 1 · 2017-01-29 18:17:07Z

2 suggestions:

do you really need a LIKE, can't you use an equality?
can't you try to first filter the XML and then (but still in the same query) going up with the needed joins to retrieve what you need ?

Also your performances obviously depend on the size of various tables and on the available indexes. Your RDBM should be able to let you know which indexes it will use for your above query and based on that you may find you are missing some. Filtering on dates, xml, and LIKE patterns has a cost.

Stack Exchange Network

Optimize subquery that searches xml

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize subquery that searches xml

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions