Multiple rows or column array

Question 1

I am working on a personal project where I need to be able to store and retrieve game statistics for a bunch of players and support very fast lookup on each player id. My current design (with unnecessary details omitted) looks something along the lines of

games: player1's score > player2's score > player3's score > player4's score
+--------------------------+--------------------+--------------------+--------------------+--------------------+
| id BIGSERIAL PRIMARY KEY | player1 VARCHAR(8) | player2 VARCHAR(8) | player3 VARCHAR(8) | player4 VARCHAR(8) |
+--------------------------+--------------------+--------------------+--------------------+--------------------+
| 1 | playerA | playerB | playerC | playerD |
+--------------------------+--------------------+--------------------+--------------------+--------------------+
| 1 | playerE | playerF | playerE | playerA |
+--------------------------+--------------------+--------------------+--------------------+--------------------+
| 1 | playerF | playerB | playerC | playerE |
+--------------------------+--------------------+--------------------+--------------------+--------------------+
player_games:
+-----------------------+-------------------+------------------+
| id SERIAL PRIMARY KEY | player VARCHAR(8) | gameid BIGSERIAL |
+-----------------------+-------------------+------------------+
| 1 | asdf | 1 |
+-----------------------+-------------------+------------------+
| 2 | asdf | 2 |
+-----------------------+-------------------+------------------+
| 3 | fdsa | 1 |
+-----------------------+-------------------+------------------+
| ... | ... | ... |
+-----------------------+-------------------+------------------+

and I will do a player lookup along the lines of

SELECT * FROM games WHERE id IN (SELECT gameid FROM player_games WHERE player='<player>')

Since I will be inserting tens of thousands of games per day, I am looking for ways to efficiently store data in player_games. The other alternative I am considering is to use an array, so instead we will have something along the lines of

player_games:
+-------------------------------+---------------------+
| player VARCHAR(8) PRIMARY KEY | gameids BIGSERIAL[] |
+-------------------------------+---------------------+

and I will do a lookup with

SELECT * FROM games WHERE id IN unnest(SELECT gameids FROM player_games WHERE player='<player>')

Which option is the better option here, and in the case of the first, is it beneficial to have an index on the player column? I will be batch inserting roughly 4000 rows per hour (90000 rows per day) into player_games after populating the historical data.

Question 2

I don't understand the design of the games table at the top. Can you give a sample row corresponding to the rows in player_games. How many players participate in a game?

Question 3

3 or 4 players participate per game and have an omitted score column per player. I have also omitted some game metadata.

Question 4

Given the information in the question I would start out with something like:

CREATE TABLE players
( player char(8) not null primary key
, additional attributes );
CREATE TABLE games
( game_id int not null primary key
, additional attributes );
CREATE TABLE player_games (match?)
( player char(8) not null
 references players (player)
, game_id int not null
 references games (game_id)
, participant_no smallint not null
, constraint ... check (participant_no between 1 and 4)
, primary key (player, game_id)
, unique (game_id, participant_no) )
CREATE TABLE results 
( game_id int not null
, player char(8) not null
, score ... not null
, foreign key (game_id, player)
 references player_games (game_id, player)

Example of queries that can easily be answered

Which games has a player participated in?

SELECT game_id 
FROM player_games
WHERE player = ?

JOIN with games if you need more info from each game

Which players participated in a game?

SELECT player 
FROM player_games
WHERE game_id = ?

JOIN with players if you need more info from each player

Order the players from game X according to there score:

SELECT player, score
FROM results
WHERE game_id = X
ORDER BY score

Question 5

SELECT *
FROM games
WHERE id IN (SELECT gameid FROM player_games WHERE player='<player>')

Just rewrite that as a JOIN

SELECT games.*
FROM games AS g
JOIN player_games AG pg ON pg.gameid = g.id
WHERE player = 1ドル;

If you haven't implemented any of this stuff, as a minor note using id is really an anti-pattern and should be avoided as a naming convention. JOINing on an array is a bad idea. I would leave things the way they are. On the high side, how many games are people playing? Aggregating 10,000 games should be very fast.

Which option is the better option here, and in the case of the first, is it beneficial to have an index on the player column?

Yes, you need to index the foreign key and the column you're selecting on.

Also, do not use VARCHAR(8) for a playerid. You should be using int for an id, and something named like player_username as a text field. In PostgreSQL we rarely use varchar(8), as it offers nothing but a length check to slow down what should often times be an unrestricted column.

Even if you were using arrays, you shouldn't be using them like this,

SELECT *
FROM games
WHERE id IN unnest(
 SELECT gameids
 FROM player_games
 WHERE player='<player>'
);

Instead, write something like this with the containment operator @>

SELECT *
FROM games AS g
JOIN player_games AS pg ON g.id=pg.gameids
WHERE pg.gameids @> g.id;

Or you can use ANY,

SELECT *
FROM games AS g
JOIN player_games AS pg ON g.id=pg.gameids
WHERE g.id = ANY(pg.gameids);

Also may want to look into the intarray extension. But, again I would never do this.

Update

I would fix the schema,

CREATE SCHEMA local;
CREATE TABLE local.players (
 playerid int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
 .. your profile and stuff, perhaps the login to your own system
);
CREATE SCHEMA thirdparty;
CREATE TABLE thirdparty.username_to_players (
 playerid int REFERENCES local.players,
 thirdparty_username text
);

This would make your query look like,

SELECT local.games.*
FROM local.games AS lg
-- playerid is your internal player id EVERYWHERE
JOIN player_games AG pg USING (playerid)
JOIN thirdparty.username_to_players AS tpup USING (playerid)
WHERE tpup.thirdparty_username = 1ドル;

I would still use text there and not varchar(8) because why not? If they send a 9 digit player id, you going to call up the provider on the phone and tell them they're breaking the spec -- and, if you do will they likely care? Not worth my time, and I don't particularly care if they lie to me about their schema, I just fast, working, reliable operations. As a third party provider, it's not your duty to ensure their data meets their claims about it.

Question 6

Thanks for the speedy response! Player ids in this case are just player names and the service I am pulling the logs from enforces them to be 8 characters maximum. Should I still be using text in this case?

Question 7

@incertia see the update.

score 2 · Answer 1 · 2018-06-12 20:59:40Z

Given the information in the question I would start out with something like:

CREATE TABLE players
( player char(8) not null primary key
, additional attributes );
CREATE TABLE games
( game_id int not null primary key
, additional attributes );
CREATE TABLE player_games (match?)
( player char(8) not null
 references players (player)
, game_id int not null
 references games (game_id)
, participant_no smallint not null
, constraint ... check (participant_no between 1 and 4)
, primary key (player, game_id)
, unique (game_id, participant_no) )
CREATE TABLE results 
( game_id int not null
, player char(8) not null
, score ... not null
, foreign key (game_id, player)
 references player_games (game_id, player)

Example of queries that can easily be answered

Which games has a player participated in?

SELECT game_id 
FROM player_games
WHERE player = ?

JOIN with games if you need more info from each game

Which players participated in a game?

SELECT player 
FROM player_games
WHERE game_id = ?

JOIN with players if you need more info from each player

Order the players from game X according to there score:

SELECT player, score
FROM results
WHERE game_id = X
ORDER BY score

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges510 bronze badges · Answer 2 · 2018-06-12 20:29:17Z

SELECT *
FROM games
WHERE id IN (SELECT gameid FROM player_games WHERE player='<player>')

Just rewrite that as a JOIN

SELECT games.*
FROM games AS g
JOIN player_games AG pg ON pg.gameid = g.id
WHERE player = 1ドル;

If you haven't implemented any of this stuff, as a minor note using id is really an anti-pattern and should be avoided as a naming convention. JOINing on an array is a bad idea. I would leave things the way they are. On the high side, how many games are people playing? Aggregating 10,000 games should be very fast.

Which option is the better option here, and in the case of the first, is it beneficial to have an index on the player column?

Yes, you need to index the foreign key and the column you're selecting on.

Also, do not use VARCHAR(8) for a playerid. You should be using int for an id, and something named like player_username as a text field. In PostgreSQL we rarely use varchar(8), as it offers nothing but a length check to slow down what should often times be an unrestricted column.

Even if you were using arrays, you shouldn't be using them like this,

SELECT *
FROM games
WHERE id IN unnest(
 SELECT gameids
 FROM player_games
 WHERE player='<player>'
);

Instead, write something like this with the containment operator @>

SELECT *
FROM games AS g
JOIN player_games AS pg ON g.id=pg.gameids
WHERE pg.gameids @> g.id;

Or you can use ANY,

SELECT *
FROM games AS g
JOIN player_games AS pg ON g.id=pg.gameids
WHERE g.id = ANY(pg.gameids);

Also may want to look into the intarray extension. But, again I would never do this.

Update

I would fix the schema,

CREATE SCHEMA local;
CREATE TABLE local.players (
 playerid int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
 .. your profile and stuff, perhaps the login to your own system
);
CREATE SCHEMA thirdparty;
CREATE TABLE thirdparty.username_to_players (
 playerid int REFERENCES local.players,
 thirdparty_username text
);

This would make your query look like,

SELECT local.games.*
FROM local.games AS lg
-- playerid is your internal player id EVERYWHERE
JOIN player_games AG pg USING (playerid)
JOIN thirdparty.username_to_players AS tpup USING (playerid)
WHERE tpup.thirdparty_username = 1ドル;

I would still use text there and not varchar(8) because why not? If they send a 9 digit player id, you going to call up the provider on the phone and tell them they're breaking the spec -- and, if you do will they likely care? Not worth my time, and I don't particularly care if they lie to me about their schema, I just fast, working, reliable operations. As a third party provider, it's not your duty to ensure their data meets their claims about it.

Thanks for the speedy response! Player ids in this case are just player names and the service I am pulling the logs from enforces them to be 8 characters maximum. Should I still be using text in this case?

Stack Exchange Network

Multiple rows or column array

2 Answers 2

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Multiple rows or column array

2 Answers 2

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions