I'm building an app aimed to store lots of records (millions) and trying to find a best way to store records identifiers that is publicly accessible from the web (record URL).
I'd like to get:
- Random, probably unique, unpredictable IDs that is reasonably short and possible to use in an URL, meaning UUID will not suit as it is too long.
- Best performance from database and reduce storage space.
- Avoid creating auto increment id with the only purpose to use it as a PRIMARY key.
- ...Avoid future replication issues?
My first thought was to generate a random string like 4r5psPxuRw
and use it as a primary key (CHAR(10)). But in this case index will be 30 bytes per record and I assume that with the large amount of records INSERT performance will suffer.
Now I came to a random numeric string, for example 16 digits from 1 to 9, and store it as BIGINT(16).
Is this a good way to achieve security, speed and reduce number of surrogate keys? Am I missing something?
2 Answers 2
Let me summarize your problem and answer it as I understand it.
You want prevent external party going through every page of you site and saving information about your users.
- This is exactly what search engines do. Therefore your users should have an option to set their info as public/friends only/private.
- To control what a user (whoever it is) does you can use a concept of
Session
. It allows you to monitor overall behavior and see automated users. - When you have sessions, a logged in user cannot access someone else's info even if has right ID/URL.
What you need to do:
- Define "user" roles for your site, including automated users like search engines, APIs, advertising agents, etc.
- describe for yourself appropriate use patterns, what kind of info user need, how often, methods as authentication, authorization, access to various content.
- Find appropriate solutions to implement the above.
-
1My question was about random stings or integers as primary keys and mysql performance related to that. Thanks.Constantine– Constantine2015年12月13日 11:31:38 +00:00Commented Dec 13, 2015 at 11:31
You need to pass an ID from one web page to the 'next'? But HTTP is stateless? And you don't want to expose that ID in a URL?
Then use a cookie for passing it. That's what all the heavy hitters do. Think about how shopping sites work.
The ID (in the cookie) is probably a lookup key into a database table to find all the other info. And some of that info is used to validate that the ID is for the "current session", not yesterday's session. That is, it is more than "user_id".
UUID as a key (PRIMARY or otherwise) suck for performance. Avoid them if at all possible. That goes for your 'short' uuid.
"records identifiers that is publicly accessible" -- What do you mean? A URL is visible, period.
-
1My question was about random stings or integers as primary keys and mysql performance related to that.Constantine– Constantine2015年12月13日 11:31:34 +00:00Commented Dec 13, 2015 at 11:31
-
Random strings suck as keys. And they provide little, if any, security. Also, think about what will happen if the user bookmarks (for use next month) or shares his URL.Rick James– Rick James2015年12月13日 19:27:13 +00:00Commented Dec 13, 2015 at 19:27
-
@Kanstantsin is talking about using random strings as persistent identifiers for webpages. If you bookmark a page that is indexed on a random string, that random string will always retrieve the same page.ndm13– ndm132016年10月05日 16:47:41 +00:00Commented Oct 5, 2016 at 16:47
Explore related questions
See similar questions with these tags.
unpredictable
? Is it (1) hard to guess next record for given user OR (2) hard to index all keys in your system by brute force enumeration?CHAR(10)
can be just ASCII, no need for utf, so 10 bytes actually.. Whats your reason for not using auto-increment or other surrogate key? Using it internally and having uniquehandle
with the random identifier will give you good primary key with appending and good unique access for single row lookups (or do you expect to fetch hundreds of rows per user request by the public handle?)SELECT * FROM t WHERE comment_id IN (1,2,3)
useSELECT * FROM t WHERE comment_uid IN (1425889898813533, 9892597754837953, 3721461664138416)
. Why do I need auto incrementingcomment_id
then?