I am creating a table wich will contain user-provided URLs. I want those to be unique, so when the user gives me a URL I will first check if the URL exists and if so return the ID for the entry. If not create a new row with this URL.
Obviously I want this to be fast. What is the best option?
- Make the actual URL a varchar that is UNIQUE and look by this url?
- Make a hash of the URL and use it as a primary key of sort?
- Other ideas?
-
Hope you don't mind. I removed the PS section. We'll let you know if it's not a good fit by closing and/or downvoting!Derek Downey– Derek Downey2012年01月19日 17:19:11 +00:00Commented Jan 19, 2012 at 17:19
2 Answers 2
I would definitely go with a hash of the url and make the hash a unique index. A hash has a fixed length, so you can use CHAR
to specify the length of the column, which grants a slight performance boost over VARCHAR
or TEXT
.
But might I suggest using INSERT IGNORE
instead of making two calls to the database? Something like:
INSERT IGNORE INTO urlTable VALUES ('urlHash');
This has the benefit of ignoring any duplicate errors that might arise from attempting to insert a duplicate hash, without first having to do a SELECT COUNT(*)
query.
-
Your approach is more concise. +1 !!!RolandoMySQLDBA– RolandoMySQLDBA2012年01月19日 17:09:53 +00:00Commented Jan 19, 2012 at 17:09
-
I need the ID of the row, can I get it when doing insert ignore?nute– nute2012年01月22日 09:20:43 +00:00Commented Jan 22, 2012 at 9:20
-
Actually, do I still need a separate primary ID? Or should the hash be my primary key? Should I hash in MySQL, or in PHP?nute– nute2012年01月22日 09:24:46 +00:00Commented Jan 22, 2012 at 9:24
-
I just tested that
SELECT LAST_INSERT_ID()
will not return the ID on the value that was ignored (in the case of a duplicate). So you will either need to do aSELECT id FROM url WHERE urlHash='X'
, or drop the ambiguous primary key. It depends on your use-case. If this table actually has other columns other than the URL that you're indexing on, I'd recommend the first option and keep that auto-incrementing ID.Derek Downey– Derek Downey2012年01月23日 15:47:26 +00:00Commented Jan 23, 2012 at 15:47 -
Is there a way to make mysql fail silently if you try to write a duplicate or do you always have to query the table to check for the hash? Otherwise, I'm getting SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '211c2f38f92d7ad4380031dc533d376a' for key 'guid_hash'codecowboy– codecowboy2012年03月08日 10:45:30 +00:00Commented Mar 8, 2012 at 10:45
Unless I'm missing something, you should just create a UNIQUE index of the type HASH. I don't see what adding your own hash and triggers would add? And have the field itself NOT NULL.
CREATE TABLE `test`.`bla` (
`id` INT NOT NULL AUTO_INCREMENT,
`text` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `text_UNIQUE` USING HASH (`text`)
);
-
2Interesting idea...though according to the docs, HASH isn't available for InnoDB: dev.mysql.com/doc/refman/5.5/en/create-index.html Oddly, it doesn't throw a warning when creating it like that. But the docs indicate that it will use BTREE (for innodb) silently, though the definition says HASH.Derek Downey– Derek Downey2012年01月20日 14:56:47 +00:00Commented Jan 20, 2012 at 14:56
-
Ah good point! Sounds like something that should be mentioned in whatever answer is approved or maybe in the question itself. In any case, this question sheds some more light on this: dba.stackexchange.com/questions/2817/…Jannes– Jannes2012年01月20日 18:29:02 +00:00Commented Jan 20, 2012 at 18:29
Explore related questions
See similar questions with these tags.