Modifying `sakila` database

Question 1

This is not "real life" code. I'm trying to expand upon the well known Sakila sample database for MySQL to make it more complex. Step 7 (or 6) is running surprisingly slow.

PS: Note these are all separate queries executed against the same database in the order specified.

Add columns to sakila.customer table:

USE sakila;
ALTER TABLE customer
 ADD COLUMN multiplier DECIMAL(3,2) AFTER active;
ALTER TABLE customer
 ADD COLUMN cust_ranking VARCHAR(10) AFTER multiplier;

Duration: 0.289 sec

Create a proc to randomly distribute multiplier:

DROP PROCEDURE IF EXISTS sp_randCustMult;
DELIMITER //
CREATE PROCEDURE sp_randCustMult()
BEGIN
 -- declare a counter
 SET @start = (SELECT MIN(customer_id) FROM customer);
 SET @stop = (SELECT MAX(customer_id) FROM customer);
 -- start while loop
 WHILE @start <= @stop 
 DO
 -- select a random float variable
 SET @RAND = RAND();
 -- update NULL field
 UPDATE customer
 SET multiplier = (SELECT 
 (CASE
 WHEN @RAND <= 0.65 THEN 1.00
 WHEN @RAND <= 0.90 THEN 0.85
 WHEN @RAND <= 1.00 THEN 1.05
 END))
 WHERE customer_id = @start;
 -- tick counter one up
 SET @start = @start + 1;
 END WHILE;
END//
DELIMITER ;

Duration: 0.001 sec

Call the proc to populate the rows:
```
CALL sp_randCustMult;
```
Duration: 0.761 sec

With Safe Update Mode OFF, add human-friendly values to cust_ranking based on multiplier.

UPDATE customer
 SET cust_ranking = 'Standard'
 WHERE multiplier = 1.00;
UPDATE customer
 SET cust_ranking = 'Premium'
 WHERE multiplier = 0.85;
UPDATE customer
 SET cust_ranking = 'Uplift'
 WHERE multiplier = 1.05;

Duration:
0.057 sec 0.037 sec 0.035 sec

Add a real_amount column to table payment:

ALTER TABLE payment
ADD COLUMN real_amount DECIMAL(5,2)
AFTER amount;

Duration: 0.749 sec

Create another proc to populate payment with real amounts:

DROP PROCEDURE IF EXISTS sp_RealAmtPayment;
DELIMITER //
CREATE PROCEDURE sp_RealAmtPayment()
BEGIN
-- declare a counter
SET @start = (SELECT MIN(payment_id) FROM payment);
SET @stop = (SELECT MAX(payment_id) FROM payment);
WHILE @start <= @stop
DO
 UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
 SET real_payment = (p.amount * c.multiplier)
 WHERE p.payment_id = @start;
 SET @start = @start + 1;
END WHILE;
END//
DELIMITER ;

Duration: 0.001 sec

CALL sp_RealAmtPayment;

Duration: 17.082 sec
SELECT * FROM payment;

1000 row(s) returned
0.002 sec / 0.002 sec

Step 7 seems extremely long considering the very small number of records. What am I missing? All comments/critiques welcome!

Question 2

I've never played with mysql, so this may be completely wrong, but if I get it right the WHILE loop would be the equivalent of a T-SQL CURSOR, which is inherently slow.

You're essentially looping on payment_id, incrementing at each iteration - this assumes the ID's are contiguous, which isn't a safe assumption to make with data: if records were deleted, you have more iterations than records:

SET @start = (SELECT MIN(payment_id) FROM payment);
SET @stop = (SELECT MAX(payment_id) FROM payment);
WHILE @start <= @stop
DO
 UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
 SET real_payment = (p.amount * c.multiplier)
 WHERE p.payment_id = @start;
 SET @start = @start + 1;
END WHILE;

In pseudo-code, this can read as follows:

For each payment_id in payment...
... update the real_payment column to p.amount*c.multiplier

I don't see why you need a loop to do this, I think this would be equivalent... and faster:

UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
SET real_payment = (p.amount * c.multiplier)

Question 3

Similar to Mat's answer, in stage 2, you are calculating a random value for each customer, and processing the customers one-at-a-time.

It would be faster to process them all together, but, the rand() becomes hard to do because it changes value each time you call it, and you need to change the 'obvious' odds of things as you go.

Your procedure (with a test SQLFiddle) can be reduced to:

DROP PROCEDURE IF EXISTS sp_randCustMult;
DELIMITER //
CREATE PROCEDURE sp_randCustMult()
BEGIN
 SET @first = 0.65;
 SET @second = (0.90 - @first)/(1.0 - @first);
 -- update NULL field
 UPDATE customer
 SET multiplier = (SELECT (CASE 
 WHEN RAND() < @first then 1.0
 WHEN RAND() < @second then 0.85
 ELSE 1.05
 END));
END//
DELIMITER ;

Question 4

Cool I'll delete the data tonight and test those 2 scripts and post results. One thing I love about SQL is that the most elegant scripts are the simplest ones.

Question 5

Duration: 0.036 sec. Everything seems spiffy and in proportion.

score 7 · Accepted Answer · 2014-05-27 19:42:23Z

I've never played with mysql, so this may be completely wrong, but if I get it right the WHILE loop would be the equivalent of a T-SQL CURSOR, which is inherently slow.

You're essentially looping on payment_id, incrementing at each iteration - this assumes the ID's are contiguous, which isn't a safe assumption to make with data: if records were deleted, you have more iterations than records:

SET @start = (SELECT MIN(payment_id) FROM payment);
SET @stop = (SELECT MAX(payment_id) FROM payment);
WHILE @start <= @stop
DO
 UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
 SET real_payment = (p.amount * c.multiplier)
 WHERE p.payment_id = @start;
 SET @start = @start + 1;
END WHILE;

In pseudo-code, this can read as follows:

For each payment_id in payment...
... update the real_payment column to p.amount*c.multiplier

I don't see why you need a loop to do this, I think this would be equivalent... and faster:

UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
SET real_payment = (p.amount * c.multiplier)

Stack Exchange Network

Modifying `sakila` database

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Modifying `sakila` database

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions