7
\$\begingroup\$

This is not "real life" code. I'm trying to expand upon the well known Sakila sample database for MySQL to make it more complex. Step 7 (or 6) is running surprisingly slow.

PS: Note these are all separate queries executed against the same database in the order specified.

  1. Add columns to sakila.customer table:

    USE sakila;
    ALTER TABLE customer
     ADD COLUMN multiplier DECIMAL(3,2) AFTER active;
    ALTER TABLE customer
     ADD COLUMN cust_ranking VARCHAR(10) AFTER multiplier;
    

    Duration: 0.289 sec

  2. Create a proc to randomly distribute multiplier:

    DROP PROCEDURE IF EXISTS sp_randCustMult;
    DELIMITER //
    CREATE PROCEDURE sp_randCustMult()
    BEGIN
     -- declare a counter
     SET @start = (SELECT MIN(customer_id) FROM customer);
     SET @stop = (SELECT MAX(customer_id) FROM customer);
     -- start while loop
     WHILE @start <= @stop 
     DO
     -- select a random float variable
     SET @RAND = RAND();
     -- update NULL field
     UPDATE customer
     SET multiplier = (SELECT 
     (CASE
     WHEN @RAND <= 0.65 THEN 1.00
     WHEN @RAND <= 0.90 THEN 0.85
     WHEN @RAND <= 1.00 THEN 1.05
     END))
     WHERE customer_id = @start;
     -- tick counter one up
     SET @start = @start + 1;
     END WHILE;
    END//
    DELIMITER ;
    

    Duration: 0.001 sec

  3. Call the proc to populate the rows:

    CALL sp_randCustMult;
    

    Duration: 0.761 sec

  4. With Safe Update Mode OFF, add human-friendly values to cust_ranking based on multiplier.

    UPDATE customer
     SET cust_ranking = 'Standard'
     WHERE multiplier = 1.00;
    UPDATE customer
     SET cust_ranking = 'Premium'
     WHERE multiplier = 0.85;
    UPDATE customer
     SET cust_ranking = 'Uplift'
     WHERE multiplier = 1.05;
    

    Duration:
    0.057 sec 0.037 sec 0.035 sec

  5. Add a real_amount column to table payment:

    ALTER TABLE payment
    ADD COLUMN real_amount DECIMAL(5,2)
    AFTER amount;
    

    Duration: 0.749 sec

  6. Create another proc to populate payment with real amounts:

    DROP PROCEDURE IF EXISTS sp_RealAmtPayment;
    DELIMITER //
    CREATE PROCEDURE sp_RealAmtPayment()
    BEGIN
    -- declare a counter
    SET @start = (SELECT MIN(payment_id) FROM payment);
    SET @stop = (SELECT MAX(payment_id) FROM payment);
    WHILE @start <= @stop
    DO
     UPDATE payment AS p
     INNER JOIN customer AS c
     ON p.customer_id = c.customer_id
     SET real_payment = (p.amount * c.multiplier)
     WHERE p.payment_id = @start;
     SET @start = @start + 1;
    END WHILE;
    END//
    DELIMITER ;
    

    Duration: 0.001 sec

  7. CALL sp_RealAmtPayment;

    Duration: 17.082 sec

  8. SELECT * FROM payment;

    1000 row(s) returned
    0.002 sec / 0.002 sec

Step 7 seems extremely long considering the very small number of records. What am I missing? All comments/critiques welcome!

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked May 24, 2014 at 8:13
\$\endgroup\$

2 Answers 2

7
\$\begingroup\$

I've never played with , so this may be completely wrong, but if I get it right the WHILE loop would be the equivalent of a T-SQL CURSOR, which is inherently slow.

You're essentially looping on payment_id, incrementing at each iteration - this assumes the ID's are contiguous, which isn't a safe assumption to make with data: if records were deleted, you have more iterations than records:

SET @start = (SELECT MIN(payment_id) FROM payment);
SET @stop = (SELECT MAX(payment_id) FROM payment);
WHILE @start <= @stop
DO
 UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
 SET real_payment = (p.amount * c.multiplier)
 WHERE p.payment_id = @start;
 SET @start = @start + 1;
END WHILE;

In pseudo-code, this can read as follows:

  • For each payment_id in payment...
  • ... update the real_payment column to p.amount*c.multiplier

I don't see why you need a loop to do this, I think this would be equivalent... and faster:

UPDATE payment AS p
 INNER JOIN customer AS c
 ON p.customer_id = c.customer_id
SET real_payment = (p.amount * c.multiplier)
answered May 27, 2014 at 19:42
\$\endgroup\$
0
7
\$\begingroup\$

Similar to Mat's answer, in stage 2, you are calculating a random value for each customer, and processing the customers one-at-a-time.

It would be faster to process them all together, but, the rand() becomes hard to do because it changes value each time you call it, and you need to change the 'obvious' odds of things as you go.

Your procedure (with a test SQLFiddle) can be reduced to:

DROP PROCEDURE IF EXISTS sp_randCustMult;
DELIMITER //
CREATE PROCEDURE sp_randCustMult()
BEGIN
 SET @first = 0.65;
 SET @second = (0.90 - @first)/(1.0 - @first);
 -- update NULL field
 UPDATE customer
 SET multiplier = (SELECT (CASE 
 WHEN RAND() < @first then 1.0
 WHEN RAND() < @second then 0.85
 ELSE 1.05
 END));
END//
DELIMITER ;
answered May 27, 2014 at 21:00
\$\endgroup\$
2
  • \$\begingroup\$ Cool I'll delete the data tonight and test those 2 scripts and post results. One thing I love about SQL is that the most elegant scripts are the simplest ones. \$\endgroup\$ Commented May 27, 2014 at 21:07
  • 1
    \$\begingroup\$ Duration: 0.036 sec. Everything seems spiffy and in proportion. \$\endgroup\$ Commented May 28, 2014 at 4:32

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.