Optimizing Queries HAVING COUNT and INSERT...ON DUPLICATE KEY UPDATE

Question 1

There are two main queries I'm tying to improve, one question is much more conceptual and learning about databases and another is much more straightforward.

Both cases deal with the same table It has 2 relevant columns as far as the question is concerned, they make up the primary key of the table. Each of the elements of the primary key is hypothetically the key to another table n their own right. The first row is an int (this is an item) an the second row is a VARCHAR (this is a user) limited to 200 characters.

The first issue comes in trying to find an intersection of users. I am looking to find an intersection of users that share a number of items, 5 in the example

SELECT userid 
FROM (SELECT b.userid, 
 b.showid 
 FROM ratings AS A 
 JOIN ratings AS B 
 ON A.userid LIKE :user 
 AND B.userid NOT LIKE :user 
 AND A.showid LIKE B.showid) AS C 
GROUP BY userid 
HAVING Count(*) > 4

On a fast run this takes up about 2/3 of the runtime of my program (appearing twice, one nested in another query taking little longer than this one). I've tried looking at it but I don't see any obvious way to restructure the query to save time, but queries operating over many more rows are coming up with runtimes of less than 1 second while this takes 2 seconds.

The second issue is more conceptual. As mentioned before I have a composite primary key. I've found elsewhere that with InnoDB tables this can make inserts take up a lot more time. This is exactly whats happening with an INSERT ... UPDATE ON DUPLICATE KEY query that I'm running. What is the best way to go about reducing this slowdown.

This is what the insert looks like, it only runs if the show has been rated

INSERT INTO shows 
 ( 
 showid, 
 showname, 
 showimage 
 ) 
 VALUES 
 ( 
 :id, 
 :name, 
 :image 
 ) 
on duplicate KEY 
UPDATE showimage=:image, 
 showname= :name;
INSERT INTO ratings 
 ( 
 userid, 
 showid, 
 rating 
 ) 
 VALUES 
 ( 
 :user, 
 :show, 
 :rating 
 ) 
on duplicate KEY 
UPDATE rating=:rating

Shows

CREATE TABLE "shows" (
 "ShowId" int(11) NOT NULL,
 "ShowName" varchar(400) NOT NULL,
 "ShowImage" varchar(400) NOT NULL,
 "ShowAgreement" int(11) NOT NULL DEFAULT "0",
 PRIMARY KEY ("ShowId")
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Ratings

CREATE TABLE "ratings" (
 "UserId" varchar(200) NOT NULL,
 "ShowId" int(11) NOT NULL,
 "Rating" int(11) NOT NULL,
 PRIMARY KEY ("UserId","ShowId"),
 KEY "ShowId" ("ShowId"),
 CONSTRAINT "ratings_ibfk_1" FOREIGN KEY ("ShowId") REFERENCES "shows" ("ShowId") ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1

For a single test case the INSERT takes an additional 8 seconds over the UPDATE for 160 entries.

Should I simply run the conditional part of the query codeside and removing the primary key.
Should I drop the primary key and replace it with a unique constraint?
Should I create a new table for the UserId that creates a VARCHAR -> INT identity for userId's and give all existing tables the INT version of the id rather than the CHAR?
Should I do something else entirely that hadn't occurred to me?

Question 2

What is the cardinality of ratings? The join condition looks fairly dangerous, is something missing there (itemid for example)?

Question 3

Also, the closing parenthesis for the sub-select is missing

Question 4

My service is telling me cardinality for UserId is 37 and 6958 for ItemId

Question 5

Thanks for pointing out the join condition, I just realized I forgot to make sure that this is only happening for items both users have rated, this is giving me a slight speedup

Question 6

So on average each user have 188 items. I don't understand what the query is supposed to do, assume A.userid LIKE :user matches one of those users, that's 188 rows. B.userid NOT LIKE :user then matches the remaining users, so the cardinality of the join would be somewhere in the ballpark of a miljon rows. Are you sure nothing is missing in the join predicate?

Question 7

I'm guessing a bit on what your query is supposed to do. I assume you are looking for all users that share 5 or more items with a certain user:

SELECT a.userid 
FROM ratings AS a
WHERE EXISTS (
 SELECT 1
 FROM ratings AS b
 WHERE b.userid = :user
 AND a.userid <> b.userid
 AND a.showid = b.showid
)
GROUP BY a.userid
HAVING COUNT(1) > 4

Is that about right?

For the second part of the question, as I understand it the tables are defined as:

create table shows
( showid int not null primary key
, showname varchar(10) not null
, showimage varchar(10)
, x int default 0 not null );
create table ratings
( userid varchar(200) not null
, showid int not null
 references shows(showid)
, rating int not null
, primary key (showid, userid) );

If that's the case I see no particular reason why the insert in ratings is slow, btw what is slow?

Some ideas for trying to pinpoint the problem, make a copy of the ratings table, say ratings2, drop the foreign key and see how that affects the insert performance

Question 8

Thanks that dropped the run-time to about 1/10 of the run-time of my query while getting the same results

Question 9

For the second part of the question, what does the insert statement look like?

Question 10

Updated the question with the query

Question 11

Is the insert's slow all the time, or do they get slower and slower the more inserts you do?

Question 12

The inserts are slow the updates are fast. If i'm putting a new user in the added time seems to scale linearly with the number of shows they have watched

score 0 · Accepted Answer · 2015-07-07 22:10:28Z

I'm guessing a bit on what your query is supposed to do. I assume you are looking for all users that share 5 or more items with a certain user:

SELECT a.userid 
FROM ratings AS a
WHERE EXISTS (
 SELECT 1
 FROM ratings AS b
 WHERE b.userid = :user
 AND a.userid <> b.userid
 AND a.showid = b.showid
)
GROUP BY a.userid
HAVING COUNT(1) > 4

Is that about right?

For the second part of the question, as I understand it the tables are defined as:

create table shows
( showid int not null primary key
, showname varchar(10) not null
, showimage varchar(10)
, x int default 0 not null );
create table ratings
( userid varchar(200) not null
, showid int not null
 references shows(showid)
, rating int not null
, primary key (showid, userid) );

If that's the case I see no particular reason why the insert in ratings is slow, btw what is slow?

Some ideas for trying to pinpoint the problem, make a copy of the ratings table, say ratings2, drop the foreign key and see how that affects the insert performance

Thanks that dropped the run-time to about 1/10 of the run-time of my query while getting the same results
For the second part of the question, what does the insert statement look like?
Is the insert's slow all the time, or do they get slower and slower the more inserts you do?
The inserts are slow the updates are fast. If i'm putting a new user in the added time seems to scale linearly with the number of shows they have watched

Stack Exchange Network

Optimizing Queries HAVING COUNT and INSERT...ON DUPLICATE KEY UPDATE

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing Queries HAVING COUNT and INSERT...ON DUPLICATE KEY UPDATE

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions