I have a table in a MySQL database that has information about passenger flights. I want to do some analysis of flight delays. I would like to add a calculated attribute to each row that indicates how many flights arriving at the current row's departure airport, by the same carrier, were delayed in the past 24 hours.
I haven't had any luck writing this query. I think that I can express my intent with:
-- Find a count of all rows
count(*) where
-- Of the same carrier
a.FlightCarrier = thisRow.FlightCarrier
-- that arrived at our departure airport
a.DestinationAirport = thisRow.OriginAirport
-- scheduled to arrive before ScheduledDepartureTimeUTC
a.ScheduledArrivalTimeUTC < thisRow.ScheduledDepartureTimeUTC
-- scheduled to arrive after (ScheduledDepartureTimeUTC - 24h)
a.ScheduledArrivalTimeUTC > (thisRow.ScheduledDepartureTimeUTC - 24h)
-- had a delayed arrival
a.DelayedArrival = 1
I attempted to write this as a SQL statement:
alter table flight add column DelayedCarrierArrivals24H int as
(select count(*) from flight a where
a.FlightCarrier = FlightCarrier
and a.DestinationAirport = OriginAirport
and a.ScheduledArrivalTimeUTC < ScheduledDepartureTimeUTC
and a.ScheduledArrivalTimeUTC > (ScheduledDepartureTimeUTC - (60 * 60 * 24))
and a.DelayedArrival = 1) STORED;
However, MySQL balks at this with "You have an error in your SQL syntax... near 'select count(*) from flight a where a.FlightCarrier = FlightCarrier"
What's the 'right' way to create this column in MySQL? I had originally thought to use windowing functions but couldn't divine the right way to express my intent with those features, either.
1 Answer 1
The problem
https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html says:
Generated column expressions must adhere to the following rules. An error occurs if an expression contains disallowed constructs.
...
Subqueries, parameters, variables, stored functions, and user-defined functions are not permitted.
...
Your expression includes a subquery. Note that the error message pointed directly to the subquery: "... near 'select ..."
Generated columns are rather limited (for good reasons).
Suggestion 1
May I suggest that you do the computation when you are fetching data from the table. Consider writing a VIEW
or Stored routine to hide it from the caller.
Suggestion 2
Perhaps this comes closer to what you are looking for?
SELECT b.*,
(
SELECT count(*)
from flight a
where a.FlightCarrier = b.FlightCarrier
and a.DestinationAirport = b.OriginAirport
and a.ScheduledArrivalTimeUTC < b.ScheduledDepartureTimeUTC
and a.ScheduledArrivalTimeUTC >= (b.ScheduledDepartureTimeUTC - (60 * 60 * 24))
and a.DelayedArrival = 1
)
FROM flight AS b;
It won't be efficient due to the correlated subquery, but we can work on that. First, verify that it does what you need.
Suggestion 3
Better would be to have a column for that info, then do the computation separately as you are storing the b
information.
INSERT INTO flight ( ..., DelayedCarrierArrivals24H)
SELECT ..., -- the values from your source
( SELECT ... ) AS DelayedCarrierArrivals24H;