I'm trying to refresh my very rusty SQL skills.
I have a table with device logs, collected every 5 minutes. I'm interested in counting the number of times a device is either 'connected' or 'disconnected' per device, per day.
deviceId, status, tag, timestamp
A, Connected, connectivity, 2022年10月27日T09:25:02Z
B, Connected, connectivity, 2022年10月27日T09:25:02Z
C, Disconnected, connectivity, 2022年10月27日T09:25:02Z
D, Disconnected, connectivity, 2022年10月27日T09:25:02Z
E, Disconnected, connectivity, 2022年10月27日T09:25:02Z
A, Connected, connectivity, 2022年10月27日T09:30:02Z
B, Connected, connectivity, 2022年10月27日T09:30:02Z
C, Disconnected, connectivity, 2022年10月27日T09:30:02Z
D, Disconnected, connectivity, 2022年10月27日T09:30:02Z
E, Disconnected, connectivity, 2022年10月27日T09:30:02Z
A, Connected, connectivity, 2022年10月27日T09:35:02Z
B, Connected, connectivity, 2022年10月27日T09:35:02Z
C, Disconnected, connectivity, 2022年10月27日T09:35:02Z
D, Disconnected, connectivity, 2022年10月27日T09:35:02Z
E, Disconnected, connectivity, 2022年10月27日T09:35:02Z
A, Connected, connectivity, 2022年10月27日T09:48:10Z
I came up with the following SQL, since it's been a while, I was wondering if anyone has some feedback or alternative ideas.
SELECT
COUNT(CASE WHEN status = 'Connected' THEN 1 END) connectedCount,
COUNT(CASE WHEN status = 'Disconnected' THEN 1 END) disconnectedCount, deviceId,
FORMAT(CAST(timestamp AS DATE), 'yyyy-MM-dd') as date
FROM Reporting
WHERE tag = 'connectivity'
GROUP BY deviceId, FORMAT(CAST(timestamp AS DATE), 'yyyy-MM-dd')
ORDER BY date, deviceId
-
\$\begingroup\$ The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How do I ask a good question?. \$\endgroup\$BCdotWEB– BCdotWEB2022年11月02日 22:01:21 +00:00Commented Nov 2, 2022 at 22:01
-
\$\begingroup\$ @BCdotWEB Thanks. Do you have a suggestion for a better title? \$\endgroup\$picklepick– picklepick2022年11月03日 07:55:37 +00:00Commented Nov 3, 2022 at 7:55
-
1\$\begingroup\$ What's the DBMS you're currently using? \$\endgroup\$lemon– lemon2022年12月10日 12:00:24 +00:00Commented Dec 10, 2022 at 12:00
1 Answer 1
Date format
yyyy-MM-dd
contains all date fields. So, formatting doesn't change the uniqueness of dates. If two rows have different dates — their formatted values are also different.For that reason, I'd remove
FORMAT
fromGROUP BY
:SELECT COUNT(CASE WHEN status = 'Connected' THEN 1 END) connectedCount, COUNT(CASE WHEN status = 'Disconnected' THEN 1 END) disconnectedCount, deviceId, FORMAT(CAST(timestamp AS DATE), 'yyyy-MM-dd') as date FROM Reporting WHERE tag = 'connectivity' GROUP BY deviceId, CAST(timestamp AS DATE) ORDER BY date, deviceId
CAST AS DATE
is not the first choice. Check out whether your DBMS hasTRUNC
,DATE_TRUNC
or similar functions.Carefully check that you handle time zones correctly. Your data has time zone
UTC
. But in the output you cast it toDATE
, and there is no code explicitly specifying in which time zone this is done. So, it's possible that a server's or current session's time zone is used, and it could be not the one that you want.If there are no more statuses except
Connected
andDisconnected
, it's better to store them asbit
/boolean
data type. If it can be changed.For this specific task, data type is not significant. But questions like "Whether any device was connected at timestamp T" could be cleanly answered with
BOOL_OR
.