I have the following table:
CREATE TABLE [dbo].[Journals](
[Id] [nvarchar](450) NOT NULL,
[CorrelationId] [nvarchar](450) NOT NULL,
[Version] [int] NOT NULL,
[Type] [nvarchar](max) NOT NULL,
[Data] [nvarchar](max) NOT NULL,
[Created] [datetime2](7) NOT NULL,
[Updated] [datetime2](7) NOT NULL,
CONSTRAINT [PK_Journals] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[Journals] ADD DEFAULT (getutcdate()) FOR [Created]
GO
ALTER TABLE [dbo].[Journals] ADD DEFAULT (getutcdate()) FOR [Updated]
GO
One row could look like this:
Id: 1833eac9-3ea7-4613-be26-af3f7589d6f0
CorrelationId: 0HLNKM700ILBN:00000001
Version: 1
Type: Notification
Data: [{"type": "Event_Received","timeStamp": "2019-06-24T12:47:54.3960942Z","data": {"version": 1,"notificationName": "My Notification","receiverName": "John Doe","receiverUserIdentifier": "123456789","receiverEmail": "[email protected]","senderName": "Some Sender","senderEmail": "[email protected]","contentSubject": "This is my subject","contentBody": "And this is my body","attachmentNames": []},"sensitive": true}, {"type": "Event_ValidationSucceded","timeStamp": "2019-06-24T12:47:54.4167034Z","data": "{}","sensitive": false}, {"type": "Event_PublishStart","timeStamp": "2019-06-24T12:47:54.4691455Z","data": "{}","sensitive": false}, {"type": "Event_PublishSucceeded","timeStamp": "2019-06-24T12:47:54.4740542Z","data": "{}","sensitive": true}]
Created: 2019年06月19日 12:29:20.8100000
Updated: 2019年06月19日 12:30:51.1123311
I want to fetch all the Ids from the rows that contains array entries where Sensitive = true
So I have tried something like this:
SELECT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1
This works but I get duplicate ids back (since the array contains more than one item where Sensitive = true).
How can I limit the query so that I don't receive duplicate Ids when the array contains more than one item where Sensitive = true?
Wanted output:
1833eac9-3ea7-4613-be26-af3f7589d6f0
Actual output:
1833eac9-3ea7-4613-be26-af3f7589d6f0
1833eac9-3ea7-4613-be26-af3f7589d6f0
I guess that I could add a DISTINCT to SELECT DISTINCT Id but that feels like cheating, or? :D
1 Answer 1
You could opt to use TOP(1)
in the CROSS APPLY
SELECT Id
FROM Journals journals
CROSS APPLY
(
SELECT TOP(1) Sensitive
FROM OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive')
WHERE Sensitive = 1
) as jsonValues;
Or, as you mentioned DISTINCT
SELECT DISTINCT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1;
Main difference between the two
When using the TOP(1)
the values will be filtered and the top operator will be applied earlier in the execution plan
When using the DISTINCT
solution, the filtering and grouping is only done after the JSON function is (cross) applied
Which is why I would opt for the TOP(1)
+ CROSS APPLY
solution.
As a side note, storing unique identifiers as nvarchar()
is not ideal, especially when using a huge data type like nvarchar(450)
. Consider using the minimal or close to the minimal amount of space needed in your column datatypes & using the uniqueidentifier
datatype.
-
@JOSEFtw No problem, happy to help :)!Randi Vertongen– Randi Vertongen2019年06月25日 11:13:41 +00:00Commented Jun 25, 2019 at 11:13
UNIQUEIDENTIFIER
type instead of[N][VAR]CHAR
will be more efficient. You'll use 72 bytes/value with 'NVARCHAR' instead of 16 withUNIQUEIDENTIFIER
and as it is your clustering key you'll spend that extra 56 bytes per row again in every non-clustered index defined on that table.