There are two main queries I'm tying to improve, one question is much more conceptual and learning about databases and another is much more straightforward.
Both cases deal with the same table
It has 2 relevant columns as far as the question is concerned, they make up the primary key of the table. Each of the elements of the primary key is hypothetically the key to another table n their own right. The first row is an int (this is an item) an the second row is a VARCHAR
(this is a user) limited to 200 characters.
The first issue comes in trying to find an intersection of users. I am looking to find an intersection of users that share a number of items, 5 in the example
SELECT userid
FROM (SELECT b.userid,
b.showid
FROM ratings AS A
JOIN ratings AS B
ON A.userid LIKE :user
AND B.userid NOT LIKE :user
AND A.showid LIKE B.showid) AS C
GROUP BY userid
HAVING Count(*) > 4
On a fast run this takes up about 2/3 of the runtime of my program (appearing twice, one nested in another query taking little longer than this one). I've tried looking at it but I don't see any obvious way to restructure the query to save time, but queries operating over many more rows are coming up with runtimes of less than 1 second while this takes 2 seconds.
The second issue is more conceptual. As mentioned before I have a composite primary key. I've found elsewhere that with InnoDB
tables this can make inserts take up a lot more time. This is exactly whats happening with an INSERT ... UPDATE ON DUPLICATE KEY
query that I'm running. What is the best way to go about reducing this slowdown.
This is what the insert looks like, it only runs if the show has been rated
INSERT INTO shows
(
showid,
showname,
showimage
)
VALUES
(
:id,
:name,
:image
)
on duplicate KEY
UPDATE showimage=:image,
showname= :name;
INSERT INTO ratings
(
userid,
showid,
rating
)
VALUES
(
:user,
:show,
:rating
)
on duplicate KEY
UPDATE rating=:rating
Shows
CREATE TABLE "shows" (
"ShowId" int(11) NOT NULL,
"ShowName" varchar(400) NOT NULL,
"ShowImage" varchar(400) NOT NULL,
"ShowAgreement" int(11) NOT NULL DEFAULT "0",
PRIMARY KEY ("ShowId")
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Ratings
CREATE TABLE "ratings" (
"UserId" varchar(200) NOT NULL,
"ShowId" int(11) NOT NULL,
"Rating" int(11) NOT NULL,
PRIMARY KEY ("UserId","ShowId"),
KEY "ShowId" ("ShowId"),
CONSTRAINT "ratings_ibfk_1" FOREIGN KEY ("ShowId") REFERENCES "shows" ("ShowId") ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
For a single test case the INSERT takes an additional 8 seconds over the UPDATE for 160 entries.
- Should I simply run the conditional part of the query codeside and removing the primary key.
- Should I drop the primary key and replace it with a unique constraint?
- Should I create a new table for the UserId that creates a
VARCHAR
->INT
identity for userId's and give all existing tables theINT
version of the id rather than theCHAR
? - Should I do something else entirely that hadn't occurred to me?
1 Answer 1
I'm guessing a bit on what your query is supposed to do. I assume you are looking for all users that share 5 or more items with a certain user:
SELECT a.userid
FROM ratings AS a
WHERE EXISTS (
SELECT 1
FROM ratings AS b
WHERE b.userid = :user
AND a.userid <> b.userid
AND a.showid = b.showid
)
GROUP BY a.userid
HAVING COUNT(1) > 4
Is that about right?
For the second part of the question, as I understand it the tables are defined as:
create table shows
( showid int not null primary key
, showname varchar(10) not null
, showimage varchar(10)
, x int default 0 not null );
create table ratings
( userid varchar(200) not null
, showid int not null
references shows(showid)
, rating int not null
, primary key (showid, userid) );
If that's the case I see no particular reason why the insert in ratings is slow, btw what is slow?
Some ideas for trying to pinpoint the problem, make a copy of the ratings table, say ratings2, drop the foreign key and see how that affects the insert performance
-
Thanks that dropped the run-time to about 1/10 of the run-time of my query while getting the same resultspafk– pafk2015年07月07日 22:39:30 +00:00Commented Jul 7, 2015 at 22:39
-
For the second part of the question, what does the insert statement look like?Lennart - Slava Ukraini– Lennart - Slava Ukraini2015年07月07日 23:34:16 +00:00Commented Jul 7, 2015 at 23:34
-
Updated the question with the querypafk– pafk2015年07月07日 23:48:37 +00:00Commented Jul 7, 2015 at 23:48
-
Is the insert's slow all the time, or do they get slower and slower the more inserts you do?Lennart - Slava Ukraini– Lennart - Slava Ukraini2015年07月07日 23:58:10 +00:00Commented Jul 7, 2015 at 23:58
-
The inserts are slow the updates are fast. If i'm putting a new user in the added time seems to scale linearly with the number of shows they have watchedpafk– pafk2015年07月08日 00:02:57 +00:00Commented Jul 8, 2015 at 0:02
Explore related questions
See similar questions with these tags.
A.userid LIKE :user
matches one of those users, that's 188 rows.B.userid NOT LIKE :user
then matches the remaining users, so the cardinality of the join would be somewhere in the ballpark of a miljon rows. Are you sure nothing is missing in the join predicate?