A relational database and analytical query suite over Premier League data, analyzing Arsenal FC's 2022–23 season and evaluating 2023–24 transfer targets - with hands-on index design and query-performance optimization measured via
EXPLAIN ANALYZE.
Stack: PostgreSQL · SQL (DDL/DML) · Python (data cleaning) · ER modeling Domain: Databases · SQL · Query optimization · Sports analytics
Evaluate Arsenal FC's on-field performance and assess incoming player transfers using a properly normalized relational database - and demonstrate how indexing changes query performance at scale.
Kaggle "Player Scores" dataset - 9 tables: games, game events, appearances, clubs, club games, players, player valuations, competitions, and more.
- Schema design: ER diagrams and a normalized 9-table schema (
DDL_statement.sql). - Data loading: Python cleaning script +
DML_statement.sql. - Analytical queries: goal rates, player contributions, match outcomes, transfer evaluation.
- Indexing & optimization: B-tree and composite indexes (e.g.
appearances(player_id),appearances(game_id, goals),appearances(date),club_games(club_id)).
Query 1 was benchmarked in two states to quantify the impact of indexing:
- Before any statistics collection or indexing
- After statistics + indexes
Execution plans were compared with EXPLAIN ANALYZE to show the cost/time reduction from index usage.
Responsible for data preparation, ER diagram design, and dataset import (CIS 556 Database Systems, team of 3).
SQL scripts (DDL, DML, analysis queries, indexing) and the final report are in this folder.