I need to design an authentication system, where different type of authentication schemes may exist. At the end, I should assign a passphrase (access token) to the user.
I have several authentication schemes right now, and design should be extensible with new authentication schemes. What I've currently is:
- Insecure (direct)
- One time code over SMS
- Time-based code via external authenticators (i.e. Google Authenticator)
To clarify how does those schemes work, I will give some examples:
Insecure authentication is performed by only user credentials. Those credentials are username and password. All users have those two credentials.
One time code over SMS is another authentication scheme, having taken users credentials, a one time code is generated by the server, and sent to the user's phone via SMS. Afterwards, user supplies that one time code to the server in order to complete the authentication. In practice, time-based code is no different than this authentication scheme.
Here are the DDTs of my current design (please note that they're canonical):
CREATE TABLE users(
id serial PRIMARY KEY, -- surrogate key
username character varying (25) NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL
);
-- Those tables are middleware artifacts
CREATE TABLE one_time_codes(
id serial PRIMARY KEY, -- surrogate key
code integer NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL,
user_id integer REFERENCES users(id) NOT NULL
);
CREATE TABLE insecure_authentication_handles(
id serial PRIMARY KEY, -- surrogate key
passkey character varying (255) NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL,
user_id integer REFERENCES users(id) NOT NULL
);
-- Final product of successful authentication
CREATE TABLE passphrases(
id serial PRIMARY KEY, -- surrogate key
passkey character (5) UNIQUE NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL,
user_id integer REFERENCES users(id) NOT NULL
);
As you may understand, I used Martin Fowler's "Concrete Table Inheritance" in order to represent the concrete OneTimeCode
and InsecureAuthenticationHandle
subclasses of AuthenticationScheme
abstract class in the database. Here, each passphrase
could be matched with either one_time_code
or insecure_authentication_handle
depending on users preference of authentication. one_time_codes
and insecure_authentication_handles
are the middleware artifacts of the authentication, hence I created a view to list all of those artifacts.
CREATE VIEW concrete_authentication_middleware_artifacts AS
SELECT user_id, NULL insecure_authentication_handle_passkey, code one_time_code, inserted_at
FROM one_time_codes
UNION
SELECT user_id, passkey insecure_authentication_handle_passkey, NULL one_time_code, inserted_at
FROM insecure_authentication_handles;
Hence, if I want to pair each middleware artifact with successful authentication artifact (passphrase), I would execute the query (assume all artifacts have a valid duration of 3 minutes):
SELECT cama.*, p.passkey passphrase
FROM concrete_authentication_middleware_artifacts cama
INNER JOIN passphrases p ON cama.user_id = p.user_id AND (GREATEST(p.inserted_at, cama.inserted_at) - LEAST(p.inserted_at, cama.inserted_at) < '3 minutes'::interval);
Everything works great, however:
- I could use foreign keys instead of comparing timestamps in order to join
passphrases
table and the view, however that would lead to creating n foreign keys inpassphrases
table, whereas only one may have non-NULL
value. I'm also unaware of creating external predicates in order to ensure database integrity on such conditions. - The design does not favor the latter query with huge amount of data. This may not be a problem since it's executed rarely.
What kind of suggestions do you have? I'm not sure I reached to the most elegant design.
1 Answer 1
First, get the idea of inheritance out of your head and try to think relationally. Keep integrity as a first priority, and ease of use as a second. Saint So-and-so isn't that hot. And, I doubt anyone has heard of the buzz term Concrete table inheritance. PostgreSQL has native inheritance built it and the side effects of it are simply not worth the gain.. Even though PostgreSQL does have a concept of Multiple Inheritance.
You provide three tables...
-- Those tables are middleware artifacts
CREATE TABLE one_time_codes(
id serial PRIMARY KEY, -- surrogate key
code integer NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL,
user_id integer REFERENCES users(id) NOT NULL
);
CREATE TABLE insecure_authentication_handles(
id serial PRIMARY KEY, -- surrogate key
passkey character varying (255) NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL,
user_id integer REFERENCES users(id) NOT NULL
);
-- Final product of successful authentication
CREATE TABLE passphrases(
id serial PRIMARY KEY, -- surrogate key
passkey character (5) UNIQUE NOT NULL,
inserted_at timestamp without time zone DEFAULT now() NOT NULL,
user_id integer REFERENCES users(id) NOT NULL
);
All of this raises the question of how data relates to other data. In this case, the users
table, that we agree must exist.
- Do you have 1-to-1 relationship?
- Do you have 1-to-many relationship?
- Do you have many-to-many relationship?
That should be the principal factor in determining your data schema and not some external theory applied by an organic buzzword generator. That's how databases work.
Aside from how you related to your data, consider the other downsides.
- Ease of use. You've got a
UNION
now and aVIEW
other complexities you wouldn't otherwise need. - The alternative is faster.
JOIN
s ain't free they cost index scans and merges -- even the ones on ints. PostgreSQL has a large-row size. The fewer rows you have in memory and have to fetch the better. Same forUNION
, withoutUNION ALL
you have to sort and filter distinct. With regularUNION
you may not know how many duplicates you're getting back. - You're not using the tools available to you: PostgreSQL provides
hstore
, (sql-standard)array
,jsonb
types. You can use these for schema-less design on your user. I suggest it if you're not needing to select on that criteria.
So you could just as easily,
ALTER TABLE users ADD COLUMN (auth jsonb);
And, then put one_time_codes
, passkeys
, and passphrases
in that.
As a side note,
- you should consider reviewing
pgcrypto
for authentication needs. - Don't specify character varying unless you truly need a character limit by specification. Otherwise, use
text
. It's faster. And, that's the PostgreSQL convention. They're both implemented in the same way, butvarchar
has a length-check on it.
Testing
200 passcodes, for 2000 users using JSON and PostgreSQL array.
JSON Schema
CREATE TABLE users AS
SELECT t1.userid, md5(t1.userid::text), x AS passcodes
FROM generate_series(1,2000) AS t1(userid)
CROSS JOIN LATERAL (
SELECT t1.userid, jsonb_agg(floor(random()*1000)::int)
FROM generate_series(1,200) AS t(x)
GROUP BY true
) AS t2(userid,x);
CREATE UNIQUE INDEX ON users (userid);
ANALYZE users;
JSON Query
This is 35ms. You should never have to do this.
EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @>'5' ;
This is 0.060 ms
EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @>'5'
AND userid = 42;
That's likely faster than the index lookup and merge for the other table.
SQL Array Testing
CREATE TABLE users AS
SELECT t1.userid, md5(t1.userid::text), x AS passcodes
FROM generate_series(1,2000) AS t1(userid)
CROSS JOIN LATERAL (
SELECT t1.userid, array_agg(floor(random()*1000)::int)
FROM generate_series(1,200) AS t(x)
GROUP BY true
) AS t2(userid,x);
CREATE UNIQUE INDEX ON users (userid);
ANALYZE users;
SQL Array Query
This is 7ms. You should never have to do this.
EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @> ARRAY[5];
This is 0.076 ms
EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @> ARRAY[5]
AND userid = 42;
That's likely faster than the index lookup and merge for the other table.
-
Please define how could you constitute integrity with NoSQL solutions found in a relational database. I also believe the relationships are pretty visible.Buğra Ekuklu– Buğra Ekuklu2017年01月18日 06:50:26 +00:00Commented Jan 18, 2017 at 6:50
-
There is no integrity on a voluntary field in either situation. It's a field a user may have regardless of if you implement it in a 1-to-1 or in a jsonb attribute. If by integrity you mean a
CHECK
constraint, JSONB provides for that (though I don't think it's necessary). You can check something does exist or does not exist using the JSONB operators postgresql.org/docs/current/static/functions-json.html If something must exist as an attribute of the user, I would store it on the user directly.Evan Carroll– Evan Carroll2017年01月18日 06:56:12 +00:00Commented Jan 18, 2017 at 6:56 -
So you tell me, if user has 150 one time codes and 200 passphrases, all of them should be contained in a single field of a row?Buğra Ekuklu– Buğra Ekuklu2017年01月18日 06:58:36 +00:00Commented Jan 18, 2017 at 6:58
-
Really 200? You just told me " In practice, no [there is only one]." But, Yes. So long as you're not searching on that stuff for multiple users. A sequential scan on 200 items is very likely as fast as an index scan on 4,000,000. Which is what you'd have if you had 20,000 users. I have no idea what passphrases mean in your app (200 sounds crazy for any definition of passphrases though).Evan Carroll– Evan Carroll2017年01月18日 07:03:46 +00:00Commented Jan 18, 2017 at 7:03
-
Boy, you are acting like a chill. There is nothing to exaggerate. A user may have multiple passphrases. Those passphrases could be generated upon a one time code, or an insecure authentication handle, but not both. What I said to you was a passphrase is generated upon only one of those, but not both. Turns out, my design does not bring anything to the integrity from that aspect, simply, a passphrase could be generated upon two of them (though it is not possible in practice).Buğra Ekuklu– Buğra Ekuklu2017年01月18日 07:08:53 +00:00Commented Jan 18, 2017 at 7:08
Explore related questions
See similar questions with these tags.
bars
in your example at StackExchange have the user_id?