-
Notifications
You must be signed in to change notification settings - Fork 21
-
Building a production-ready country leaderboard
The country-based leaderboard can be a valuable feature for DevImpact. It can help users discover impactful open-source developers by region and make the project more useful for exploring public GitHub activity.
This discussion is about how we can build the leaderboard in a scalable, reliable, and production-ready way.
The main goal is to avoid calculating scores for many users during page load or inside a public API request. Leaderboard scoring can be expensive because each GitHub user may require multiple GitHub API calls for repositories, pull requests, issues, and discussions.
Main problem
A country leaderboard can contain many users.
If we calculate all scores live when someone opens a country page, this can cause several problems:
- Too many GitHub API requests
- GitHub rate-limit issues
- GitHub GraphQL resource-limit issues
- Slow page loads
- Vercel/serverless timeout risk
- Expensive repeated requests
- Poor user experience
- Possible abuse of public scoring endpoints
For example, if one country has 200 users and each user requires multiple GitHub API calls, one page visit could trigger hundreds or thousands of GitHub requests.
That is not safe for production.
Core principle
The leaderboard page should read cached results only.
It should not calculate scores live.
The score calculation should happen separately in a scheduled/background process.
Recommended flow:
Scheduled job / worker / GitHub Actions / VPS cron → fetch country users → calculate scores gradually → store results in cache/database /api/leaderboard/[country] → receive only the country slug → read cached leaderboard data → return paginated results /leaderboard/[country] → display cached leaderboard results
This keeps the UI fast, protects GitHub API limits, and gives us more control over score refreshes.
Proposed architecture
1. Country leaderboard page
Route:
/leaderboard/[country]
Responsibilities:
- Validate the country slug
- Fetch leaderboard data from
/api/leaderboard/[country] - Show loading, empty, error, and cache-not-ready states
- Display leaderboard rows
- Support pagination
- Show metadata such as
lastUpdatedAtandscoreVersion
The page should not send a list of usernames to the scoring API.
2. Public leaderboard API
Route:
/api/leaderboard/[country]
Example:
/api/leaderboard/yemen?page=1&pageSize=25
Responsibilities:
- Receive only the country slug
- Validate the country slug
- Read leaderboard data from cache/database
- Return paginated results
- Return metadata
- Never calculate all country scores live
Example response:
{
"success": true,
"status": "ready",
"country": "yemen",
"page": 1,
"pageSize": 25,
"total": 120,
"lastUpdatedAt": "2026年06月13日T10:00:00Z",
"scoreVersion": "v1.0.0",
"users": [
{
"rank": 1,
"username": "example",
"name": "Example User",
"avatarUrl": "https://github.com/example.png",
"profileUrl": "https://github.com/example",
"repoScore": 120.5,
"prScore": 90.2,
"contributionScore": 12.4,
"finalScore": 87.6
}
]
}Cache-not-ready state
When a country has not been scored yet, the API should not throw an error.
It should return a clean response:
{
"success": true,
"status": "not_ready",
"country": "yemen",
"users": [],
"message": "Leaderboard data is not available yet."
}The UI can then show:
Country leaderboard scoring will be available soon.
This is better than showing a failure state.
Separate score calculation from the API endpoint
The leaderboard endpoint should not calculate country scores.
We should extract the leaderboard scoring logic into reusable service functions.
Example structure:
lib/leaderboard/ countries.ts leaderboard-cache.ts leaderboard-types.ts leaderboard-service.ts leaderboard-refresh.ts
Possible responsibilities:
leaderboard-service.ts → get cached leaderboard → paginate leaderboard → validate country slug leaderboard-refresh.ts → fetch users for a country → calculate scores → build leaderboard rows → store results in cache/database leaderboard-cache.ts → read cache → write cache → manage cache keys
This separation lets us use the scoring logic from different places:
- GitHub Actions scheduled workflow
- VPS cron job
- background worker
- manual admin script
- future admin dashboard
Scheduled scoring job
The leaderboard should be refreshed in a scheduled job instead of during page load.
Possible options:
Option 1: GitHub Actions schedule
Good for early versions.
Pros:
- Easy to set up
- Runs outside the web request
- No extra server required
Cons:
- Needs secure secrets
- Runtime limits
- Not ideal for very large jobs
Option 2: VPS cron job
Good for more control.
Pros:
- More flexible
- Better for long-running jobs
- Easier to control rate limits
Cons:
- Requires server management
Option 3: background worker / queue
Best long-term option.
Pros:
- Better retry logic
- Better concurrency control
- Better rate-limit handling
- Better scaling
Cons:
- More infrastructure
For the first production version, GitHub Actions or a simple VPS cron job is probably enough.
Cache key design
Leaderboard cache keys should include the country slug and score version.
Example:
leaderboard:{country}:scoreVersion:{version}Example:
leaderboard:yemen:scoreVersion:v1.0.0
This is important because when the scoring algorithm changes, old leaderboard results may no longer be valid.
We should avoid mixing scores from different scoring versions.
Metadata to store with leaderboard results
Each cached leaderboard should include metadata:
{
"country": "yemen",
"scoreVersion": "v1.0.0",
"lastUpdatedAt": "2026年06月13日T10:00:00Z",
"totalUsers": 120,
"successfulUsers": 115,
"failedUsers": 5
}This helps with:
- Debugging
- Transparency
- UI display
- Knowing when results are stale
- Knowing if the refresh job partially failed
Handling failed users
Some GitHub users may fail during refresh.
Examples:
- User was deleted
- User was renamed
- User is suspended
- GitHub API request failed
- Rate limit was hit
- User has no public data
A single failed user should not fail the whole country leaderboard.
Recommended behavior:
{
"failedUsers": [
{
"username": "example",
"reason": "not_found"
}
]
}The refresh job should continue scoring the remaining users.
Avoid replacing good cache with partial failed data
We should avoid overwriting a working leaderboard with incomplete results if a refresh job fails halfway.
Recommended flow:
1. Read current active leaderboard cache. 2. Build new results in a temporary cache key. 3. Complete scoring for the country. 4. Validate the final result. 5. Replace the active cache only if the refresh succeeds. 6. If the refresh fails, keep the old cache.
Example keys:
leaderboard:yemen:scoreVersion:v1.0.0 leaderboard:yemen:scoreVersion:v1.0.0:tmp
This prevents broken or incomplete data from replacing good data.
Pagination
Country leaderboard results should be paginated.
Example:
/api/leaderboard/yemen?page=1&pageSize=25
Default values:
page: 1 pageSize: 25 maxPageSize: 100
This prevents returning hundreds or thousands of users in one response.
Deterministic sorting
Leaderboard sorting should be stable.
Suggested sorting order:
finalScore desc repoScore desc prScore desc contributionScore desc username asc
This prevents users with equal scores from changing order randomly between refreshes.
Country slug validation
Before reading cache or starting a refresh job, we should validate that the country slug exists in our known country list.
Invalid country slugs should return 404.
Example:
const countryExists = countries.some((country) => country.slug === slug); if (!countryExists) { notFound(); }
This avoids unnecessary work and gives cleaner behavior.
Duplicate usernames
Country source data may contain duplicate usernames.
Before scoring, usernames should be normalized:
trim lowercase remove duplicates
This avoids duplicated GitHub API calls and duplicated leaderboard rows.
Rate-limit and concurrency control
The scheduled job should avoid running too many GitHub requests at the same time.
Recommended protections:
- Limit concurrency
- Add delay between batches if needed
- Retry failed requests carefully
- Stop safely when GitHub rate limits are close
- Store progress/logs
- Avoid refreshing the same country twice at the same time
A lock key can help:
leaderboard:refresh-lock:{country}Example:
leaderboard:refresh-lock:yemen
This prevents duplicate jobs from scoring the same country at the same time.
GitHub fetch limits
For normal scoring, we should keep GitHub fetch limits reasonable.
Suggested temporary limits:
PRs: 100 Issues: 100 Discussions: 50
These values are safer than fetching very large amounts for every user.
Later, we can support deeper fetching in scheduled jobs, where we can control rate limits and cache results.
Difference between compare scoring and leaderboard scoring
The normal compare feature and the leaderboard feature should not be treated exactly the same.
Compare feature
Used when a user compares two GitHub profiles.
Recommended behavior:
- Real-time scoring is acceptable - Usually only 2 users - Should be cached - Should stay fast
Leaderboard feature
Used to rank many users in a country.
Recommended behavior:
- Should not score live - Should use cached results - Should be refreshed in background - Should support pagination - Should protect GitHub rate limits
This separation is important.
Proposed API design
Get country leaderboard
GET /api/leaderboard/[country]?page=1&pageSize=25
Reads cached results only.
Does not calculate scores live.
Refresh country leaderboard
This should not be public.
Possible options:
POST /api/internal/leaderboard/[country]/refresh
or a direct script:
pnpm leaderboard:refresh yemen
or:
pnpm leaderboard:refresh --all
This refresh logic can be used by GitHub Actions, a VPS cron job, or another worker.
Suggested implementation phases
Phase 1: Safe leaderboard foundation
Goal: keep the UI/routes safe while we prepare the backend architecture.
Tasks:
- Disable live country scoring
- Keep the UI/routes
- Add temporary "leaderboard data will be available soon" state
- Reduce GitHub fetch limits
- Localize the header link
- Make the header link responsive
- Validate country slugs
Phase 2: Cached leaderboard API
Goal: make the country page read from cached data.
Tasks:
- Add
/api/leaderboard/[country] - Add cache read logic
- Add
not_readyresponse - Add pagination
- Add
lastUpdatedAt - Add
scoreVersion - Add empty/error UI states
Phase 3: Scheduled score refresh
Goal: calculate country scores outside page requests.
Tasks:
- Extract leaderboard refresh service
- Add country user fetching
- Normalize usernames
- Score users in batches
- Store results in cache
- Track failed users
- Add deterministic sorting
- Add refresh metadata
- Avoid replacing good cache with failed partial data
Phase 4: Production hardening
Goal: make the leaderboard safe and reliable in production.
Tasks:
- Add rate limiting
- Add concurrency control
- Add refresh locks
- Add retry logic
- Add logs
- Add admin/manual refresh command
- Add tests for cache, pagination, failures, and stale data
Open questions
We still need to decide:
-
Where should cached leaderboard data be stored?
- Redis
- database
- JSON files generated by scheduled jobs
- another storage option
-
How often should leaderboards refresh?
- daily
- weekly
- manually
- based on country size
-
Should all countries refresh equally?
- large countries may need slower/batched refreshes
- smaller countries can refresh faster
-
Should the leaderboard show all users or only top users?
- top 25
- top 50
- top 100
- paginated full list
-
Should failed users be visible in admin/debug output only, or also exposed in API metadata?
Final direction
The leaderboard should be built around this principle:
Do expensive GitHub scoring in the background. Serve leaderboard pages from cache. Keep public API requests fast and safe.
This gives DevImpact a much better production foundation and avoids turning the leaderboard into an expensive live GitHub API workload.
Beta Was this translation helpful? Give feedback.