Designing an extensible authentication system

Question 1

I need to design an authentication system, where different type of authentication schemes may exist. At the end, I should assign a passphrase (access token) to the user.

I have several authentication schemes right now, and design should be extensible with new authentication schemes. What I've currently is:

Insecure (direct)
One time code over SMS
Time-based code via external authenticators (i.e. Google Authenticator)

To clarify how does those schemes work, I will give some examples:

Insecure authentication is performed by only user credentials. Those credentials are username and password. All users have those two credentials.

One time code over SMS is another authentication scheme, having taken users credentials, a one time code is generated by the server, and sent to the user's phone via SMS. Afterwards, user supplies that one time code to the server in order to complete the authentication. In practice, time-based code is no different than this authentication scheme.

Here are the DDTs of my current design (please note that they're canonical):

CREATE TABLE users(
 id serial PRIMARY KEY, -- surrogate key
 username character varying (25) NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL
);
-- Those tables are middleware artifacts
CREATE TABLE one_time_codes(
 id serial PRIMARY KEY, -- surrogate key
 code integer NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);
CREATE TABLE insecure_authentication_handles(
 id serial PRIMARY KEY, -- surrogate key
 passkey character varying (255) NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);
-- Final product of successful authentication
CREATE TABLE passphrases(
 id serial PRIMARY KEY, -- surrogate key
 passkey character (5) UNIQUE NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);

As you may understand, I used Martin Fowler's "Concrete Table Inheritance" in order to represent the concrete OneTimeCode and InsecureAuthenticationHandle subclasses of AuthenticationScheme abstract class in the database. Here, each passphrase could be matched with either one_time_code or insecure_authentication_handle depending on users preference of authentication. one_time_codes and insecure_authentication_handles are the middleware artifacts of the authentication, hence I created a view to list all of those artifacts.

CREATE VIEW concrete_authentication_middleware_artifacts AS
 SELECT user_id, NULL insecure_authentication_handle_passkey, code one_time_code, inserted_at
 FROM one_time_codes
 UNION
 SELECT user_id, passkey insecure_authentication_handle_passkey, NULL one_time_code, inserted_at
 FROM insecure_authentication_handles;

Hence, if I want to pair each middleware artifact with successful authentication artifact (passphrase), I would execute the query (assume all artifacts have a valid duration of 3 minutes):

SELECT cama.*, p.passkey passphrase
FROM concrete_authentication_middleware_artifacts cama
INNER JOIN passphrases p ON cama.user_id = p.user_id AND (GREATEST(p.inserted_at, cama.inserted_at) - LEAST(p.inserted_at, cama.inserted_at) < '3 minutes'::interval);

Everything works great, however:

I could use foreign keys instead of comparing timestamps in order to join passphrases table and the view, however that would lead to creating n foreign keys in passphrases table, whereas only one may have non-NULL value. I'm also unaware of creating external predicates in order to ensure database integrity on such conditions.
The design does not favor the latter query with huge amount of data. This may not be a problem since it's executed rarely.

What kind of suggestions do you have? I'm not sure I reached to the most elegant design.

Question 2

Can a user meaningfully have two of any of those at the same time?

Question 3

I'm confused if your view is created as such, then why doesn't bars in your example at StackExchange have the user_id?

Question 4

@EvanCarroll In practice, no. However, my design does not come up with an external predicate for that, so theoretically, its doable.

Question 5

@EvanCarroll It was more simplified version of that. Good catch :)

Question 6

That's a major mistake in the description of the problem's domain @Leviathlon. This is still not a good schema. But, when I commented that is was god-awful horrible (or the like) that was the prime reason. At least you can live with this.

Question 7

First, get the idea of inheritance out of your head and try to think relationally. Keep integrity as a first priority, and ease of use as a second. Saint So-and-so isn't that hot. And, I doubt anyone has heard of the buzz term Concrete table inheritance. PostgreSQL has native inheritance built it and the side effects of it are simply not worth the gain.. Even though PostgreSQL does have a concept of Multiple Inheritance.

You provide three tables...

-- Those tables are middleware artifacts
CREATE TABLE one_time_codes(
 id serial PRIMARY KEY, -- surrogate key
 code integer NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);
CREATE TABLE insecure_authentication_handles(
 id serial PRIMARY KEY, -- surrogate key
 passkey character varying (255) NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);
-- Final product of successful authentication
CREATE TABLE passphrases(
 id serial PRIMARY KEY, -- surrogate key
 passkey character (5) UNIQUE NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);

All of this raises the question of how data relates to other data. In this case, the users table, that we agree must exist.

Do you have 1-to-1 relationship?
Do you have 1-to-many relationship?
Do you have many-to-many relationship?

That should be the principal factor in determining your data schema and not some external theory applied by an organic buzzword generator. That's how databases work.

Aside from how you related to your data, consider the other downsides.

Ease of use. You've got a UNION now and a VIEW other complexities you wouldn't otherwise need.
The alternative is faster. JOINs ain't free they cost index scans and merges -- even the ones on ints. PostgreSQL has a large-row size. The fewer rows you have in memory and have to fetch the better. Same for UNION, without UNION ALL you have to sort and filter distinct. With regular UNION you may not know how many duplicates you're getting back.
You're not using the tools available to you: PostgreSQL provides hstore, (sql-standard) array, jsonb types. You can use these for schema-less design on your user. I suggest it if you're not needing to select on that criteria.

So you could just as easily,

ALTER TABLE users ADD COLUMN (auth jsonb);

And, then put one_time_codes, passkeys, and passphrases in that.

As a side note,

you should consider reviewing pgcrypto for authentication needs.
Don't specify character varying unless you truly need a character limit by specification. Otherwise, use text. It's faster. And, that's the PostgreSQL convention. They're both implemented in the same way, but varchar has a length-check on it.

Testing

200 passcodes, for 2000 users using JSON and PostgreSQL array.

JSON Schema

CREATE TABLE users AS
SELECT t1.userid, md5(t1.userid::text), x AS passcodes
FROM generate_series(1,2000) AS t1(userid)
CROSS JOIN LATERAL (
 SELECT t1.userid, jsonb_agg(floor(random()*1000)::int)
 FROM generate_series(1,200) AS t(x)
 GROUP BY true
) AS t2(userid,x);
CREATE UNIQUE INDEX ON users (userid);
ANALYZE users;

JSON Query

This is 35ms. You should never have to do this.

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @>'5' ;

This is 0.060 ms

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @>'5'
AND userid = 42;

That's likely faster than the index lookup and merge for the other table.

SQL Array Testing

CREATE TABLE users AS
SELECT t1.userid, md5(t1.userid::text), x AS passcodes
FROM generate_series(1,2000) AS t1(userid)
CROSS JOIN LATERAL (
 SELECT t1.userid, array_agg(floor(random()*1000)::int)
 FROM generate_series(1,200) AS t(x)
 GROUP BY true
) AS t2(userid,x);
CREATE UNIQUE INDEX ON users (userid);
ANALYZE users;

SQL Array Query

This is 7ms. You should never have to do this.

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @> ARRAY[5];

This is 0.076 ms

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @> ARRAY[5]
AND userid = 42;

That's likely faster than the index lookup and merge for the other table.

Question 8

Please define how could you constitute integrity with NoSQL solutions found in a relational database. I also believe the relationships are pretty visible.

Question 9

There is no integrity on a voluntary field in either situation. It's a field a user may have regardless of if you implement it in a 1-to-1 or in a jsonb attribute. If by integrity you mean a CHECK constraint, JSONB provides for that (though I don't think it's necessary). You can check something does exist or does not exist using the JSONB operators postgresql.org/docs/current/static/functions-json.html If something must exist as an attribute of the user, I would store it on the user directly.

Question 10

So you tell me, if user has 150 one time codes and 200 passphrases, all of them should be contained in a single field of a row?

Question 11

Really 200? You just told me " In practice, no [there is only one]." But, Yes. So long as you're not searching on that stuff for multiple users. A sequential scan on 200 items is very likely as fast as an index scan on 4,000,000. Which is what you'd have if you had 20,000 users. I have no idea what passphrases mean in your app (200 sounds crazy for any definition of passphrases though).

Question 12

Boy, you are acting like a chill. There is nothing to exaggerate. A user may have multiple passphrases. Those passphrases could be generated upon a one time code, or an insecure authentication handle, but not both. What I said to you was a passphrase is generated upon only one of those, but not both. Turns out, my design does not bring anything to the integrity from that aspect, simply, a passphrase could be generated upon two of them (though it is not possible in practice).

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges510 bronze badges · Answer 1 · 2017-01-18 06:38:48Z

First, get the idea of inheritance out of your head and try to think relationally. Keep integrity as a first priority, and ease of use as a second. Saint So-and-so isn't that hot. And, I doubt anyone has heard of the buzz term Concrete table inheritance. PostgreSQL has native inheritance built it and the side effects of it are simply not worth the gain.. Even though PostgreSQL does have a concept of Multiple Inheritance.

You provide three tables...

-- Those tables are middleware artifacts
CREATE TABLE one_time_codes(
 id serial PRIMARY KEY, -- surrogate key
 code integer NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);
CREATE TABLE insecure_authentication_handles(
 id serial PRIMARY KEY, -- surrogate key
 passkey character varying (255) NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);
-- Final product of successful authentication
CREATE TABLE passphrases(
 id serial PRIMARY KEY, -- surrogate key
 passkey character (5) UNIQUE NOT NULL,
 inserted_at timestamp without time zone DEFAULT now() NOT NULL,
 user_id integer REFERENCES users(id) NOT NULL
);

All of this raises the question of how data relates to other data. In this case, the users table, that we agree must exist.

Do you have 1-to-1 relationship?
Do you have 1-to-many relationship?
Do you have many-to-many relationship?

That should be the principal factor in determining your data schema and not some external theory applied by an organic buzzword generator. That's how databases work.

Aside from how you related to your data, consider the other downsides.

Ease of use. You've got a UNION now and a VIEW other complexities you wouldn't otherwise need.
The alternative is faster. JOINs ain't free they cost index scans and merges -- even the ones on ints. PostgreSQL has a large-row size. The fewer rows you have in memory and have to fetch the better. Same for UNION, without UNION ALL you have to sort and filter distinct. With regular UNION you may not know how many duplicates you're getting back.
You're not using the tools available to you: PostgreSQL provides hstore, (sql-standard) array, jsonb types. You can use these for schema-less design on your user. I suggest it if you're not needing to select on that criteria.

So you could just as easily,

ALTER TABLE users ADD COLUMN (auth jsonb);

And, then put one_time_codes, passkeys, and passphrases in that.

As a side note,

you should consider reviewing pgcrypto for authentication needs.
Don't specify character varying unless you truly need a character limit by specification. Otherwise, use text. It's faster. And, that's the PostgreSQL convention. They're both implemented in the same way, but varchar has a length-check on it.

Testing

200 passcodes, for 2000 users using JSON and PostgreSQL array.

JSON Schema

CREATE TABLE users AS
SELECT t1.userid, md5(t1.userid::text), x AS passcodes
FROM generate_series(1,2000) AS t1(userid)
CROSS JOIN LATERAL (
 SELECT t1.userid, jsonb_agg(floor(random()*1000)::int)
 FROM generate_series(1,200) AS t(x)
 GROUP BY true
) AS t2(userid,x);
CREATE UNIQUE INDEX ON users (userid);
ANALYZE users;

JSON Query

This is 35ms. You should never have to do this.

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @>'5' ;

This is 0.060 ms

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @>'5'
AND userid = 42;

That's likely faster than the index lookup and merge for the other table.

SQL Array Testing

CREATE TABLE users AS
SELECT t1.userid, md5(t1.userid::text), x AS passcodes
FROM generate_series(1,2000) AS t1(userid)
CROSS JOIN LATERAL (
 SELECT t1.userid, array_agg(floor(random()*1000)::int)
 FROM generate_series(1,200) AS t(x)
 GROUP BY true
) AS t2(userid,x);
CREATE UNIQUE INDEX ON users (userid);
ANALYZE users;

SQL Array Query

This is 7ms. You should never have to do this.

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @> ARRAY[5];

This is 0.076 ms

EXPLAIN ANALYZE SELECT * FROM users
WHERE passcodes @> ARRAY[5]
AND userid = 42;

That's likely faster than the index lookup and merge for the other table.

Please define how could you constitute integrity with NoSQL solutions found in a relational database. I also believe the relationships are pretty visible.
There is no integrity on a voluntary field in either situation. It's a field a user may have regardless of if you implement it in a 1-to-1 or in a jsonb attribute. If by integrity you mean a CHECK constraint, JSONB provides for that (though I don't think it's necessary). You can check something does exist or does not exist using the JSONB operators postgresql.org/docs/current/static/functions-json.html If something must exist as an attribute of the user, I would store it on the user directly.
So you tell me, if user has 150 one time codes and 200 passphrases, all of them should be contained in a single field of a row?
Really 200? You just told me " In practice, no [there is only one]." But, Yes. So long as you're not searching on that stuff for multiple users. A sequential scan on 200 items is very likely as fast as an index scan on 4,000,000. Which is what you'd have if you had 20,000 users. I have no idea what passphrases mean in your app (200 sounds crazy for any definition of passphrases though).
Boy, you are acting like a chill. There is nothing to exaggerate. A user may have multiple passphrases. Those passphrases could be generated upon a one time code, or an insecure authentication handle, but not both. What I said to you was a passphrase is generated upon only one of those, but not both. Turns out, my design does not bring anything to the integrity from that aspect, simply, a passphrase could be generated upon two of them (though it is not possible in practice).

Stack Exchange Network

Designing an extensible authentication system

1 Answer 1

Testing

JSON Schema

JSON Query

SQL Array Testing

SQL Array Query

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Designing an extensible authentication system

1 Answer 1

Testing

JSON Schema

JSON Query

SQL Array Testing

SQL Array Query

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions