I created a table like this,
CREATE TABLE Leaderboard(
userId bigint not null,
matchId bigint not null,
score mediumint not null,
country CHAR(10),
tournamentId int not null
)
I am creating matches of 5 people, all from 5 distinct countries ("UK", "USA", "SPAIN", "GERMANY", "FRANCE)
I want to insert some a million entries to this table with following requirements,
A match will be formed like this
usergermany | userUSA | userSPAIN | userUK | userFrance
so a match will have 5 users, all from distinct countries. A tournament will have many matches, and a user can only play in one match in a tournament.
So an example table will look like this
userid matchid score country tournamentid
...
988654 3877543 random USA 177
388654 3877543 random GERMANY 177
433432 3877543 random FRANCE 177
776212 3877543 random UK 177
1632987 3877543 random SPAIN 177
2113242 3877544 random SPAIN 177
2918974 3877544 random USA 177
111738 3877544 random UK 177
1772342 3877544 random FRANCE 177
1343243 3877544 random GERMANY 177
123131 3877545 random UK 178
1231414 3877545 random FRANCE 178
2858348 3877545 random GERMANY 178
1122432 3877545 random USA 178
2923434 3877545 random SPAIN 178
...
A userId cannot exist in a tournament twice,
A country cannot exists in a match twice (match groups are formed from distinct countries)
Also since 5 players compete in a match, a matchid will only appear 5 times in the table.
I want to insert a million entries into this table, userid being randomly from 1 to 3 million with the above constraints. 10 thousand matches for each tournament, so there will be 100 tournaments starting from 1.
100 tournaments, 10000 matches each, make up to 1 million rows.
--UPDATE--
create table seq_data as
with recursive tmp(x) as (
select 1
union all
select x+1 from tmp
limit 3000000
)
select * from tmp;
CREATE TABLE if not exists Leaderboard(
userId bigint not null,
matchId bigint not null,
score mediumint not null,
country CHAR(10),
tournamentId int not null
);
DELIMITER //
CREATE PROCEDURE InsertLeaderboardData()
BEGIN
DECLARE tournamentloop INT DEFAULT 0;
DECLARE matchloop INT DEFAULT 0;
DECLARE groupLoop INT DEFAULT 0;
DECLARE matchidKey INT DEFAULT 1;
WHILE tournamentloop < 1 DO
SET tournamentloop = tournamentloop + 1;
SET matchloop = 0;
WHILE matchloop < 10000 DO
SET matchidKey = matchidKey + 1;
SET matchloop = matchloop + 1;
SET groupLoop = 0;
WHILE groupLoop < 5 DO
SET groupLoop = groupLoop + 1;
INSERT INTO Leaderboard (userId, matchId, score, country, tournamentId)
VALUES (1, matchidKey, FLOOR(RAND() * 1000) + 1, ELT(groupLoop, "SPAIN", "FRANCE", "UK", "USA", "GERMANY"), tournamentloop);
SELECT matchidKey;
END WHILE;
END WHILE;
END WHILE;
END;
//
DELIMITER ;
CALL InsertLeaderboardData();
DROP PROCEDURE InsertLeaderboardData;
This runs but slowly, yet I want 1 more feature.
As to INSERT INTO
's userId
I set the value to 1 for now to edit it later. What I want to do is, I created sequenced numbers into the table seq_data starting from 1 up to 3 million. For each WHILE tournamentloop < 100 DO
scope, I want to get 50 thousand DISTINCT numbers from the seq_data
table. so I can give the users an id.
The morale is, no userid can appear twice in a tournament. so for each tournament scope i need to retrieve random ids which are distinct and assign it to rows. I already created the table for sequenced numbers, I don't know what is the best approach from here.
-- Statement
INSERT INTO Leaderboard (userId, matchId, score, country, tournamentId) VALUES (useridKey, matchidKey, FLOOR(RAND() * 1000) + 1, ELT(groupLoop, "SPAIN", "FRANCE", "UK", "USA", "GERMANY"), tournamentloop);
I turn this into
DECLARE tournamentloop INT DEFAULT 0;
DECLARE matchloop INT DEFAULT 0;
DECLARE groupLoop INT DEFAULT 0;
DECLARE matchidKey INT DEFAULT 1;
DECLARE useridKey INT DEFAULT 1;
PREPARE insertStmt FROM 'INSERT INTO Leaderboard (userId, matchId, score, country, tournamentId)VALUES (useridKey, matchidKey, FLOOR(RAND() * 1000) + 1, ELT(groupLoop, "SPAIN", "FRANCE", "UK", "USA", "GERMANY"), tournamentloop);';
EXECUTE insertStmt;
I get error: Unknown column 'useridKey' in 'field list'
-
Please study the difference between user-defined variables and local variables. And your task does not need in stored procedure, iterations and so on, needed dataset can be generated with single query which uses recursive CTE and statically generated countries table.Akina– Akina2024年04月04日 08:39:45 +00:00Commented Apr 4, 2024 at 8:39
-
WITH RECURSIVE must be written only once! all CTEs (both recursive and non-) are written after it before outer SELECT separated with a comma. Study Reference manual for correct syntax. Also you need only one CTE which generates a list of natural integers. Then you'd join needed amount of its copies in the outer query and use each copy for its personal purposes adding needed WHERE which restricts the amount of used numbers to needed one.Akina– Akina2024年04月04日 13:37:56 +00:00Commented Apr 4, 2024 at 13:37
-
Where can we find the organizations documented rules for these tournaments and examples of how they think the participants will be managed.Wilson Hauck– Wilson Hauck2024年04月04日 21:35:02 +00:00Commented Apr 4, 2024 at 21:35
-
I don't know. I just have a small workshop I am working on which I explained in the question. Yet I still couldn't achieve. I hope @Akina can help because this is a bit advanced for me.umarkaa– umarkaa2024年04月04日 21:54:22 +00:00Commented Apr 4, 2024 at 21:54
-
@Akina please demonstrate how I can use the join with outer query for this purpose, I tried with while yet it is very very slow. Please please demonstrate, I really want to learn.umarkaa– umarkaa2024年04月05日 02:58:20 +00:00Commented Apr 5, 2024 at 2:58
1 Answer 1
INSERT INTO Leaderboard (matchID, score, country, tournamentID)
WITH RECURSIVE
cte1 (num) AS ( SELECT 1 UNION ALL SELECT num + 1 FROM cte1 LIMIT 100 ),
cte2 (country) AS (SELECT "SPAIN" UNION ALL SELECT "FRANCE" UNION ALL SELECT "UK"
UNION ALL SELECT "USA" UNION ALL SELECT "GERMANY")
SELECT t1.num, -- match ID
RAND() * 100, -- random score in 0..99
t3.country,
t2.num -- tournament ID
FROM cte1 t1 -- match IDs
CROSS JOIN cte1 t2 -- tournament IDs
CROSS JOIN cte2 t3
WHERE t1.num <= 3 -- 3 matches per tournament
AND t2.num < 5 -- 4 tournaments
;
-- totally the query creates 3 matches * 4 tournaments * 5 countries = 60 rows
-
this is so great! indeed there is one last missing column, which is the userid, userid can be any number let's say between 1 to thousand but should be unique for every tournament id. (since a player can't play 2 matches at the same time in a tournament)umarkaa– umarkaa2024年04月05日 04:34:27 +00:00Commented Apr 5, 2024 at 4:34
-
@umarkaa there is one last missing column, which is the userid See provided fiddle.
userid
is defined as autoincrement, so its values are filled automatically..Akina– Akina2024年04月05日 04:38:19 +00:00Commented Apr 5, 2024 at 4:38 -
spasiva. Can you also try this, as I am studying the code... Not making it a PK but a random number between 0-50... but unique to each tournamentid... that is all,umarkaa– umarkaa2024年04月05日 04:45:13 +00:00Commented Apr 5, 2024 at 4:45
-
@umarkaa Using cte1 and cte2, create cte3 which selects, for each country, some amount of randomly selected Ids (for example, join random country to each separate id, this will produce a rowset where there is no a userid value which belongs to more than one coutry). I.e. you'll obtain random amount of pairs (userid, country) for each country (check that those amount of pairs per country is not too small, maybe use dynamic country selection). In outer query you'd select one random pair for current country, with correlated subquery, for example.Akina– Akina2024年04月05日 04:50:24 +00:00Commented Apr 5, 2024 at 4:50
-
1same userid will always belong to 1 country If so then create 100 users (use
cte1
), assign each user to definite country (usenum MOD 5
as country index), insert random user with needed country into each table row.Akina– Akina2024年04月05日 05:00:02 +00:00Commented Apr 5, 2024 at 5:00