There is a ProductTT
table as you can see below:
[dbo].[ProductTT] (ID int , Product Varchar(50) , Time Int)
...which contains the following rows:
1 XX 0030
2 UY 0354
3 YY 0517
4 ZZ 0712
5 WW 0415
6 GG 1112
7 MM 1030
8 HH 0913
Note: The format of the data in time
column is hh:mm
so 0030 is 00:30.
I want to write a query to categorize the rows based on their time
value. I need to have 4 categories like this:
category1 00 to 03
category2 03 to 06
category3 06 to 09
category4 09 to 12
I need to see how many rows pertain to each category.
My attempt so far
What I've written so far is like this:
With CTE
AS (select ID,
product,
[time],
Case
When left(time,2)>=00 and left(time,2)< 03 then 'group1'
when left(time,2)>=03 and left(time,2)< 06 then 'group2'
when left(time,2)>=06 and left(time,2)< 09 then 'group3'
when left(time,2)>=09 and left(time,2)<=12 then 'group4' End AS groupID
from [dbo].[ProductTT]
)
select groupid,count(*) as recordcount
from cte
group by groupid
My question
That query works fine but I just want to know whether there are better ways to write this query and avoid using a CASE expression.
3 Answers 3
You stored Time
as an int
but then displayed it as a string (with leading zeros). Those don't get stored, so in order to perform calculations that need to handle the leading zeros, you need to convert to a string first (your current query doesn't do this, so either your query doesn't work, or that table structure is not accurate). Since this is a linear calculation (groups of 3), you can simplify away the CASE
expression by simply dividing the first two digits in the time by 3 (and thanks to SQL Server's integer division, the remainder gets discarded, and we add 1 to go from 0-3 to 1-4). Of course, there is an exception, because you want 12 PM to be in group 4, not group 5. With a CASE
expression this could just be left to the ELSE
clause, but if you eliminate CASE
, you will have to deal with that exception explicitly - that's all the COALESCE
/NULLIF
stuff at the end.
;WITH x AS
(
SELECT ID, Product, [Time] = RIGHT('000'+CONVERT(varchar(4),[Time]),4)
FROM dbo.ProductTT
), y AS
(
SELECT ID, Product, [Time], h = CONVERT(char(2),[Time])
FROM x
)
SELECT ID, Product, [Time],
[GroupID] = 'group' + CONVERT(char(1),h/3+1-COALESCE(NULLIF(h%11,1)-h%11,1))
FROM y;
Results:
ID Product Time GroupID
-- ------- ---- -------
1 XX 0030 group1
2 UY 0354 group2
3 YY 0517 group2
4 ZZ 0712 group3
5 WW 0415 group2
6 GG 1112 group4
7 MM 1030 group4
8 HH 0913 group4
I strongly recommend you use the actual time
data type, as that is what it was designed for. Then you can use DATEPART(HOUR(
in your calculations instead of messy string manipulation, the query above is less complex and, as a bonus, you get built-in validation, to avoid invalid times like 1369
and 9997
. Or if the leading zeros are important but you don't care about validation, use char(4)
instead of int
.
I also think you need to handle the case where an event happens in the afternoon.
And FWIW I am not sure why you don't want to use a CASE
expression here. It's a few more characters, sure, but it's a lot more clear what the query is actually doing. Code that is self-documenting is much more valuable than code that is slightly shorter. This is simpler IMHO, and would be even simpler if you used the right data types:
;WITH x AS
(
SELECT ID, Product, [Time] = RIGHT('000'+CONVERT(varchar(4),[Time]),4)
FROM dbo.ProductTT
)
SELECT ID, Product, [Time],
GroupID = 'group' + CASE CONVERT(char(2),[Time])/3
WHEN 0 THEN '1'
WHEN 1 THEN '2'
WHEN 2 THEN '3'
ELSE '4' END
FROM x;
I'd use a numbers table to categorize the values in dbo.ProductTT.
I've created a simple MCVE 1 to show how this works. FYI, in future, it would be great if you'd provide code like this when asking a question. It helps everyone.
USE tempdb;
IF OBJECT_ID(N'dbo.ProductTT', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.ProductTT;
END
CREATE TABLE dbo.ProductTT
(
ID int NOT NULL PRIMARY KEY CLUSTERED
, Product varchar(50) NOT NULL
, CreateTime int NOT NULL
, FormattedCreateTime AS RIGHT('0000' + CONVERT(varchar(4), CreateTime), 4)
);
INSERT INTO dbo.ProductTT
VALUES (1, 'XX', 0030)
, (2, 'UY', 0354)
, (3, 'YY', 0517)
, (4, 'ZZ', 0712)
, (5, 'WW', 0415)
, (6, 'GG', 1112)
, (7, 'MM', 1030)
, (8, 'HH', 0913)
, (9, 'H1', 1230)
, (10, 'H2', 1359)
, (11, 'H3', 2359);
IF OBJECT_ID(N'dbo.TimeGroups', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.TimeGroups;
END
CREATE TABLE dbo.TimeGroups
(
TimeGroupStart int NOT NULL
, TimeGroupEnd int NOT NULL
, TimeGroupName varchar(9) NOT NULL
, PRIMARY KEY CLUSTERED (TimeGroupStart, TimeGroupEnd)
);
INSERT INTO dbo.TimeGroups (TimeGroupStart, TimeGroupEnd, TimeGroupName)
VALUES (0, 3, '00 to 03')
, (3, 6, '03 to 06')
, (6, 9, '06 to 09')
, (9, 12, '09 to 12')
, (12, 15, '12 to 15')
, (15, 18, '15 to 18')
, (18, 21, '18 to 21')
, (21, 24, '21 to 24');
The "numbers table" in the code above is called "TimeGroups".
To get the desired output, you simply join the two tables together, as in:
SELECT tg.TimeGroupName
, TimeGroupCount = COUNT(1)
FROM dbo.ProductTT tt
INNER JOIN dbo.TimeGroups tg ON (tt.CreateTime / 100) >= tg.TimeGroupStart
AND (tt.CreateTime / 100) < tg.TimeGroupEnd
GROUP BY tg.TimeGroupName
ORDER BY tg.TimeGroupName;
The output looks like:
╔═══════════════╦════════════════╗ ║ TimeGroupName ║ TimeGroupCount ║ ╠═══════════════╬════════════════╣ ║ 00 to 03 ║ 1 ║ ║ 03 to 06 ║ 3 ║ ║ 06 to 09 ║ 1 ║ ║ 09 to 12 ║ 3 ║ ║ 12 to 15 ║ 2 ║ ║ 21 to 24 ║ 1 ║ ╚═══════════════╩════════════════╝
Note that the JOIN
clause in the above query specifies the range as greater-than-or-equal to the start of the category, and less-than the end of the category. If you used less-than-or-equal-to for the end of the range, you'd have ProductTT
rows showing up in multiple categories, which is clearly incorrect.
You can see how the join works with this simple query:
SELECT tt.*
, Category = tt.CreateTime / 100
FROM dbo.ProductTT tt
The output looks like:
╔════╦═════════╦════════════╦═════════════════════╦══════════╗ ║ ID ║ Product ║ CreateTime ║ FormattedCreateTime ║ Category ║ ╠════╬═════════╬════════════╬═════════════════════╬══════════╣ ║ 1 ║ XX ║ 30 ║ 0030 ║ 0 ║ ║ 2 ║ UY ║ 354 ║ 0354 ║ 3 ║ ║ 3 ║ YY ║ 517 ║ 0517 ║ 5 ║ ║ 4 ║ ZZ ║ 712 ║ 0712 ║ 7 ║ ║ 5 ║ WW ║ 415 ║ 0415 ║ 4 ║ ║ 6 ║ GG ║ 1112 ║ 1112 ║ 11 ║ ║ 7 ║ MM ║ 1030 ║ 1030 ║ 10 ║ ║ 8 ║ HH ║ 913 ║ 0913 ║ 9 ║ ║ 9 ║ H1 ║ 1230 ║ 1230 ║ 12 ║ ║ 10 ║ H2 ║ 1359 ║ 1359 ║ 13 ║ ║ 11 ║ H3 ║ 2359 ║ 2359 ║ 23 ║ ╚════╩═════════╩════════════╩═════════════════════╩══════════╝
1 - I own the website pointed to in that link
You could convert the hour part of the time string and divide it by three. The integer of this division plus 1 is equal to your group number.
(00/3) + 1 = 1
(01/3) + 1 = 1
(02/3) + 1 = 1
(03/3) + 1 = 2
(04/3) + 1 = 2
...
In that way you will no longer need the case.
To explain a little further; you use the case because you want to know if the time belongs to the group1, group2 and so on. One way to avoid the case is to figure out what group the time belongs using the formulas that I give to you. You can calculate the groupId field using that formula: "group"&((to_int(to_int(left(time,2)))/3)+1). I do not know the function to convert string to int in your database so a used to_int in the example.
Explore related questions
See similar questions with these tags.