Return all rows containing specific value in json array

Question 1

I have the following table:

CREATE TABLE [dbo].[Journals](
 [Id] [nvarchar](450) NOT NULL,
 [CorrelationId] [nvarchar](450) NOT NULL,
 [Version] [int] NOT NULL,
 [Type] [nvarchar](max) NOT NULL,
 [Data] [nvarchar](max) NOT NULL,
 [Created] [datetime2](7) NOT NULL,
 [Updated] [datetime2](7) NOT NULL,
 CONSTRAINT [PK_Journals] PRIMARY KEY CLUSTERED 
(
 [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[Journals] ADD DEFAULT (getutcdate()) FOR [Created]
GO
ALTER TABLE [dbo].[Journals] ADD DEFAULT (getutcdate()) FOR [Updated]
GO

One row could look like this:

Id: 1833eac9-3ea7-4613-be26-af3f7589d6f0
CorrelationId: 0HLNKM700ILBN:00000001
Version: 1
Type: Notification
Data: [{"type": "Event_Received","timeStamp": "2019-06-24T12:47:54.3960942Z","data": {"version": 1,"notificationName": "My Notification","receiverName": "John Doe","receiverUserIdentifier": "123456789","receiverEmail": "[email protected]","senderName": "Some Sender","senderEmail": "[email protected]","contentSubject": "This is my subject","contentBody": "And this is my body","attachmentNames": []},"sensitive": true}, {"type": "Event_ValidationSucceded","timeStamp": "2019-06-24T12:47:54.4167034Z","data": "{}","sensitive": false}, {"type": "Event_PublishStart","timeStamp": "2019-06-24T12:47:54.4691455Z","data": "{}","sensitive": false}, {"type": "Event_PublishSucceeded","timeStamp": "2019-06-24T12:47:54.4740542Z","data": "{}","sensitive": true}]
Created: 2019年06月19日 12:29:20.8100000
Updated: 2019年06月19日 12:30:51.1123311

I want to fetch all the Ids from the rows that contains array entries where Sensitive = true

So I have tried something like this:

SELECT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1

This works but I get duplicate ids back (since the array contains more than one item where Sensitive = true).

How can I limit the query so that I don't receive duplicate Ids when the array contains more than one item where Sensitive = true?

Wanted output:

1833eac9-3ea7-4613-be26-af3f7589d6f0

Actual output:

1833eac9-3ea7-4613-be26-af3f7589d6f0
1833eac9-3ea7-4613-be26-af3f7589d6f0

I guess that I could add a DISTINCT to SELECT DISTINCT Id but that feels like cheating, or? :D

Question 2

As a side note: if you keys are always UUIDs then using the UNIQUEIDENTIFIER type instead of [N][VAR]CHAR will be more efficient. You'll use 72 bytes/value with 'NVARCHAR' instead of 16 with UNIQUEIDENTIFIER and as it is your clustering key you'll spend that extra 56 bytes per row again in every non-clustered index defined on that table.

Question 3

You could opt to use TOP(1) in the CROSS APPLY

SELECT Id
FROM Journals journals
CROSS APPLY 
(
SELECT TOP(1) Sensitive
FROM OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive')
WHERE Sensitive = 1
) as jsonValues;

Or, as you mentioned DISTINCT

SELECT DISTINCT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1;

Main difference between the two

When using the TOP(1) the values will be filtered and the top operator will be applied earlier in the execution plan

enter image description here

When using the DISTINCT solution, the filtering and grouping is only done after the JSON function is (cross) applied

enter image description here

Which is why I would opt for the TOP(1) + CROSS APPLY solution.

As a side note, storing unique identifiers as nvarchar() is not ideal, especially when using a huge data type like nvarchar(450). Consider using the minimal or close to the minimal amount of space needed in your column datatypes & using the uniqueidentifier datatype.

Question 4

@JOSEFtw No problem, happy to help :)!

score 3 · Accepted Answer · 2019-06-25 09:07:02Z

You could opt to use TOP(1) in the CROSS APPLY

SELECT Id
FROM Journals journals
CROSS APPLY 
(
SELECT TOP(1) Sensitive
FROM OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive')
WHERE Sensitive = 1
) as jsonValues;

Or, as you mentioned DISTINCT

SELECT DISTINCT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1;

Main difference between the two

When using the TOP(1) the values will be filtered and the top operator will be applied earlier in the execution plan

enter image description here

When using the DISTINCT solution, the filtering and grouping is only done after the JSON function is (cross) applied

enter image description here

Which is why I would opt for the TOP(1) + CROSS APPLY solution.

As a side note, storing unique identifiers as nvarchar() is not ideal, especially when using a huge data type like nvarchar(450). Consider using the minimal or close to the minimal amount of space needed in your column datatypes & using the uniqueidentifier datatype.

@JOSEFtw No problem, happy to help :)!

Randi Vertongen
– Randi Vertongen

2019年06月25日 11:13:41 +00:00
Commented Jun 25, 2019 at 11:13

Stack Exchange Network

Return all rows containing specific value in json array

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Return all rows containing specific value in json array

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions