I'm trying to find the occurrences of strings in view B in a string in table A. However, the performance is horrendous (takes about 17 minutes to run) and I was wondering if anyone had any suggestions to improve it.
view B has 3570 records and has one column of type varchar named MISubCode
Table A has 160081 records and has the schema in the attached image.
There's also a third table, (Table C) which is largely used to filter the results of table A since I don't need to find occurrences in any of those records.
My rational to solve this was as follows:
- Count the number of characters in the string in the desired search column in table A, producing one row for each character
- Filter the unneeded rows using view C
- Run each row generated in result set from 2. and use the Number column to find the occurrences from view B in the string in column from Table A until the end of the string in column from table A.
Code:
SELECT *
FROM
(
SELECT Number, A.StatusInputPoint_DESC, A.StatusInputPoint_CODE, A.StatusInputPoint_CORE_ID
FROM
(
SELECT Number, A.StatusInputPoint_DESC, A.StatusInputPoint_CODE, A.StatusInputPoint_CORE_ID
FROM
tblStatusInputPoints_CORE A
CROSS JOIN
(
SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects
) AS n(Number)
WHERE Number >= 1 AND Number <= CAST(LEN(A.StatusInputPoint_DESC) AS INT) AND System_ID = 1
) AS A
LEFT JOIN
vwAlarmLibrary_MI_StatusPoint_Mask_Match_NoWildCards C
ON
A.StatusInputPoint_CORE_ID = C.StatusInputPoint_CORE_ID
WHERE C.StatusInputPoint_CORE_ID IS NULL
) AS RemainingStatusPoints
LEFT JOIN
dbo.vwAlarmLibrary_MI_Substation_Abbreviations B
ON
SUBSTRING(RemainingStatusPoints.StatusInputPoint_DESC, Number, LEN(B.MISubCode)) = B.MISubCode
WHERE B.MISubCode IS NOT NULL
I'm at a loss as to how to make this faster. I tried several different methods and join combinations and I can't seem to get it to perform well. I'm not an expert in SQL Server so I don't know all the features.
I'm currently using SQL Server 2016
EXAMPLE:
For clarity, lets say I have these strings in view B
ABC
EFGH
IJ
And I have these strings in table A
123 ABC 45 IJ
IJ
IJ EFGH 22
The result should be this:
Pos String Occurrence
5 | 123 ABC 45 IJ | ABC
12 | 123 ABC 45 IJ | IJ
1 | IJ | IJ
1 | IJ EFGH 22 | IJ
4 | IJ EFGH 22 | EFGH
-
\$\begingroup\$ So you want to take the results from a column in TableA and find all instances of these values in ViewB, or am I mis-understanding your question \$\endgroup\$S3S– S3S2017年04月06日 21:24:25 +00:00Commented Apr 6, 2017 at 21:24
-
\$\begingroup\$ No, its the other way around. For clarity, lets say I have these strings in view B ABC EFGH IJ And I have these strings in table A 123 ABC 45 IJ IJ IJ EFGH 22 The result should be this: 5 | 123 ABC 45 IJ | ABC 12 | 123 ABC 45 IJ | IJ 1 | IJ | IJ 1 | IJ EFGH 22 | IJ 4 | IJ EFGH 22 | EFGH \$\endgroup\$Jake– Jake2017年04月06日 21:25:08 +00:00Commented Apr 6, 2017 at 21:25
-
\$\begingroup\$ Ok, so find all instances where any column in TableA contains values from a column in ViewB \$\endgroup\$S3S– S3S2017年04月06日 21:26:14 +00:00Commented Apr 6, 2017 at 21:26
-
\$\begingroup\$ No, I'm trying to figure out how to edit my question, I'll explain in better detail since comments don't preserve spacing. \$\endgroup\$Jake– Jake2017年04月06日 21:32:02 +00:00Commented Apr 6, 2017 at 21:32
2 Answers 2
I think you could just simplify this with CHARINDEX
and CROSS APPLY()
. See how this method stacks up in your environment and paste your execution plans if it happens to be slower, if you can.
To give a concrete example, create these tables:
declare @ViewB table (myStrings varchar(16))
insert into @ViewB
values
('ABC'),
('EFGH'),
('IJ')
declare @TableA table (colToSearch varchar(256))
insert into @TableA
values
('123 ABC 45 IJ'),
('IJ'),
('IJ EFGH 22')
Then this CROSS APPLY()
query should be more efficient:
select
charindex(b.myStrings,a.colToSearch) as Pos
,a.colToSearch as String
,b.myStrings as Occurance
from
@ViewB b
cross apply (select * from @TableA) a
where
charindex(b.myStrings,a.colToSearch) > 0
order by
String, charindex(b.myStrings,a.colToSearch)
Alternatively, you could use CROSS JOIN
:
select
charindex(b.myStrings,a.colToSearch) as Pos
,a.colToSearch as String
,b.myStrings as Occurance
from
@ViewB b
cross join @TableA a
where
charindex(b.myStrings,a.colToSearch) > 0
order by
String, charindex(b.myStrings,a.colToSearch)
RESULTS
+-----+---------------+-----------+
| Pos | String | Occurance |
+-----+---------------+-----------+
| 5 | 123 ABC 45 IJ | ABC |
| 12 | 123 ABC 45 IJ | IJ |
| 1 | IJ | IJ |
| 1 | IJ EFGH 22 | IJ |
| 4 | IJ EFGH 22 | EFGH |
+-----+---------------+-----------+
-
\$\begingroup\$ Why
cross apply
? It's a simplecross join
. \$\endgroup\$dnoeth– dnoeth2017年04月07日 08:38:20 +00:00Commented Apr 7, 2017 at 8:38 -
\$\begingroup\$ There's no real difference here @dnoeth for that part, but instead of enumerating a row value was just make a Cartesian product to enable the use of charindex \$\endgroup\$S3S– S3S2017年04月07日 10:49:21 +00:00Commented Apr 7, 2017 at 10:49
-
1\$\begingroup\$ scsimon you're a genius. This made it significantly faster using CROSS APPLY. Brought it down from 17 minutes to 50 seconds after adding in other filtering necessities from the original query to yours. \$\endgroup\$Jake– Jake2017年04月07日 13:21:52 +00:00Commented Apr 7, 2017 at 13:21
-
\$\begingroup\$ No worries @Jake I'm glad it sped it up! \$\endgroup\$S3S– S3S2017年04月07日 14:13:40 +00:00Commented Apr 7, 2017 at 14:13
This might give you better performance
select B.myStrings, A.colToSearch
, charindex(b.myStrings,a.colToSearch) as Pos
from @ViewB as B
join @TableA as A
on A.colToSearch like '%'+B.myStrings+'%'
-
\$\begingroup\$ I've tried this. It doesn't perform any better because it has to perform a table scan that takes up most of the execution plan due to non-sargability of %x%. While CHARINDEX is also not sargable, the execution plan apparently doesn't put all the work on the table scans and instead, 70% of the cost is on the joins. \$\endgroup\$Jake– Jake2017年04月17日 13:09:41 +00:00Commented Apr 17, 2017 at 13:09
-
\$\begingroup\$ @Jake If you are searching for data inside a varchar it is going to be non-sargable. This fix is a proper data design to address the need. \$\endgroup\$paparazzo– paparazzo2017年04月17日 14:20:50 +00:00Commented Apr 17, 2017 at 14:20
-
\$\begingroup\$ That's not necessarily true. Sargability: Why %string% Is Slow If you use a WHERE clause LIKE 'nut%' for example, the query is considered sargable because it performs a seek instead of a scan. \$\endgroup\$Jake– Jake2017年04月17日 18:59:09 +00:00Commented Apr 17, 2017 at 18:59
-
\$\begingroup\$ @Jake Whatever, you seem to have the situation under control. \$\endgroup\$paparazzo– paparazzo2017年04月17日 19:00:57 +00:00Commented Apr 17, 2017 at 19:00