Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Building a production-ready country leaderboard #156

O2sa started this conversation in Ideas
Discussion options

Building a production-ready country leaderboard

The country-based leaderboard can be a valuable feature for DevImpact. It can help users discover impactful open-source developers by region and make the project more useful for exploring public GitHub activity.

This discussion is about how we can build the leaderboard in a scalable, reliable, and production-ready way.

The main goal is to avoid calculating scores for many users during page load or inside a public API request. Leaderboard scoring can be expensive because each GitHub user may require multiple GitHub API calls for repositories, pull requests, issues, and discussions.


Main problem

A country leaderboard can contain many users.

If we calculate all scores live when someone opens a country page, this can cause several problems:

  • Too many GitHub API requests
  • GitHub rate-limit issues
  • GitHub GraphQL resource-limit issues
  • Slow page loads
  • Vercel/serverless timeout risk
  • Expensive repeated requests
  • Poor user experience
  • Possible abuse of public scoring endpoints

For example, if one country has 200 users and each user requires multiple GitHub API calls, one page visit could trigger hundreds or thousands of GitHub requests.

That is not safe for production.


Core principle

The leaderboard page should read cached results only.

It should not calculate scores live.

The score calculation should happen separately in a scheduled/background process.

Recommended flow:

Scheduled job / worker / GitHub Actions / VPS cron
 → fetch country users
 → calculate scores gradually
 → store results in cache/database
/api/leaderboard/[country]
 → receive only the country slug
 → read cached leaderboard data
 → return paginated results
/leaderboard/[country]
 → display cached leaderboard results

This keeps the UI fast, protects GitHub API limits, and gives us more control over score refreshes.


Proposed architecture

1. Country leaderboard page

Route:

/leaderboard/[country]

Responsibilities:

  • Validate the country slug
  • Fetch leaderboard data from /api/leaderboard/[country]
  • Show loading, empty, error, and cache-not-ready states
  • Display leaderboard rows
  • Support pagination
  • Show metadata such as lastUpdatedAt and scoreVersion

The page should not send a list of usernames to the scoring API.


2. Public leaderboard API

Route:

/api/leaderboard/[country]

Example:

/api/leaderboard/yemen?page=1&pageSize=25

Responsibilities:

  • Receive only the country slug
  • Validate the country slug
  • Read leaderboard data from cache/database
  • Return paginated results
  • Return metadata
  • Never calculate all country scores live

Example response:

{
 "success": true,
 "status": "ready",
 "country": "yemen",
 "page": 1,
 "pageSize": 25,
 "total": 120,
 "lastUpdatedAt": "2026年06月13日T10:00:00Z",
 "scoreVersion": "v1.0.0",
 "users": [
 {
 "rank": 1,
 "username": "example",
 "name": "Example User",
 "avatarUrl": "https://github.com/example.png",
 "profileUrl": "https://github.com/example",
 "repoScore": 120.5,
 "prScore": 90.2,
 "contributionScore": 12.4,
 "finalScore": 87.6
 }
 ]
}

Cache-not-ready state

When a country has not been scored yet, the API should not throw an error.

It should return a clean response:

{
 "success": true,
 "status": "not_ready",
 "country": "yemen",
 "users": [],
 "message": "Leaderboard data is not available yet."
}

The UI can then show:

Country leaderboard scoring will be available soon.

This is better than showing a failure state.


Separate score calculation from the API endpoint

The leaderboard endpoint should not calculate country scores.

We should extract the leaderboard scoring logic into reusable service functions.

Example structure:

lib/leaderboard/
 countries.ts
 leaderboard-cache.ts
 leaderboard-types.ts
 leaderboard-service.ts
 leaderboard-refresh.ts

Possible responsibilities:

leaderboard-service.ts
 → get cached leaderboard
 → paginate leaderboard
 → validate country slug
leaderboard-refresh.ts
 → fetch users for a country
 → calculate scores
 → build leaderboard rows
 → store results in cache/database
leaderboard-cache.ts
 → read cache
 → write cache
 → manage cache keys

This separation lets us use the scoring logic from different places:

  • GitHub Actions scheduled workflow
  • VPS cron job
  • background worker
  • manual admin script
  • future admin dashboard

Scheduled scoring job

The leaderboard should be refreshed in a scheduled job instead of during page load.

Possible options:

Option 1: GitHub Actions schedule

Good for early versions.

Pros:

  • Easy to set up
  • Runs outside the web request
  • No extra server required

Cons:

  • Needs secure secrets
  • Runtime limits
  • Not ideal for very large jobs

Option 2: VPS cron job

Good for more control.

Pros:

  • More flexible
  • Better for long-running jobs
  • Easier to control rate limits

Cons:

  • Requires server management

Option 3: background worker / queue

Best long-term option.

Pros:

  • Better retry logic
  • Better concurrency control
  • Better rate-limit handling
  • Better scaling

Cons:

  • More infrastructure

For the first production version, GitHub Actions or a simple VPS cron job is probably enough.


Cache key design

Leaderboard cache keys should include the country slug and score version.

Example:

leaderboard:{country}:scoreVersion:{version}

Example:

leaderboard:yemen:scoreVersion:v1.0.0

This is important because when the scoring algorithm changes, old leaderboard results may no longer be valid.

We should avoid mixing scores from different scoring versions.


Metadata to store with leaderboard results

Each cached leaderboard should include metadata:

{
 "country": "yemen",
 "scoreVersion": "v1.0.0",
 "lastUpdatedAt": "2026年06月13日T10:00:00Z",
 "totalUsers": 120,
 "successfulUsers": 115,
 "failedUsers": 5
}

This helps with:

  • Debugging
  • Transparency
  • UI display
  • Knowing when results are stale
  • Knowing if the refresh job partially failed

Handling failed users

Some GitHub users may fail during refresh.

Examples:

  • User was deleted
  • User was renamed
  • User is suspended
  • GitHub API request failed
  • Rate limit was hit
  • User has no public data

A single failed user should not fail the whole country leaderboard.

Recommended behavior:

{
 "failedUsers": [
 {
 "username": "example",
 "reason": "not_found"
 }
 ]
}

The refresh job should continue scoring the remaining users.


Avoid replacing good cache with partial failed data

We should avoid overwriting a working leaderboard with incomplete results if a refresh job fails halfway.

Recommended flow:

1. Read current active leaderboard cache.
2. Build new results in a temporary cache key.
3. Complete scoring for the country.
4. Validate the final result.
5. Replace the active cache only if the refresh succeeds.
6. If the refresh fails, keep the old cache.

Example keys:

leaderboard:yemen:scoreVersion:v1.0.0
leaderboard:yemen:scoreVersion:v1.0.0:tmp

This prevents broken or incomplete data from replacing good data.


Pagination

Country leaderboard results should be paginated.

Example:

/api/leaderboard/yemen?page=1&pageSize=25

Default values:

page: 1
pageSize: 25
maxPageSize: 100

This prevents returning hundreds or thousands of users in one response.


Deterministic sorting

Leaderboard sorting should be stable.

Suggested sorting order:

finalScore desc
repoScore desc
prScore desc
contributionScore desc
username asc

This prevents users with equal scores from changing order randomly between refreshes.


Country slug validation

Before reading cache or starting a refresh job, we should validate that the country slug exists in our known country list.

Invalid country slugs should return 404.

Example:

const countryExists = countries.some((country) => country.slug === slug);
if (!countryExists) {
 notFound();
}

This avoids unnecessary work and gives cleaner behavior.


Duplicate usernames

Country source data may contain duplicate usernames.

Before scoring, usernames should be normalized:

trim
lowercase
remove duplicates

This avoids duplicated GitHub API calls and duplicated leaderboard rows.


Rate-limit and concurrency control

The scheduled job should avoid running too many GitHub requests at the same time.

Recommended protections:

  • Limit concurrency
  • Add delay between batches if needed
  • Retry failed requests carefully
  • Stop safely when GitHub rate limits are close
  • Store progress/logs
  • Avoid refreshing the same country twice at the same time

A lock key can help:

leaderboard:refresh-lock:{country}

Example:

leaderboard:refresh-lock:yemen

This prevents duplicate jobs from scoring the same country at the same time.


GitHub fetch limits

For normal scoring, we should keep GitHub fetch limits reasonable.

Suggested temporary limits:

PRs: 100
Issues: 100
Discussions: 50

These values are safer than fetching very large amounts for every user.

Later, we can support deeper fetching in scheduled jobs, where we can control rate limits and cache results.


Difference between compare scoring and leaderboard scoring

The normal compare feature and the leaderboard feature should not be treated exactly the same.

Compare feature

Used when a user compares two GitHub profiles.

Recommended behavior:

- Real-time scoring is acceptable
- Usually only 2 users
- Should be cached
- Should stay fast

Leaderboard feature

Used to rank many users in a country.

Recommended behavior:

- Should not score live
- Should use cached results
- Should be refreshed in background
- Should support pagination
- Should protect GitHub rate limits

This separation is important.


Proposed API design

Get country leaderboard

GET /api/leaderboard/[country]?page=1&pageSize=25

Reads cached results only.

Does not calculate scores live.


Refresh country leaderboard

This should not be public.

Possible options:

POST /api/internal/leaderboard/[country]/refresh

or a direct script:

pnpm leaderboard:refresh yemen

or:

pnpm leaderboard:refresh --all

This refresh logic can be used by GitHub Actions, a VPS cron job, or another worker.


Suggested implementation phases

Phase 1: Safe leaderboard foundation

Goal: keep the UI/routes safe while we prepare the backend architecture.

Tasks:

  • Disable live country scoring
  • Keep the UI/routes
  • Add temporary "leaderboard data will be available soon" state
  • Reduce GitHub fetch limits
  • Localize the header link
  • Make the header link responsive
  • Validate country slugs

Phase 2: Cached leaderboard API

Goal: make the country page read from cached data.

Tasks:

  • Add /api/leaderboard/[country]
  • Add cache read logic
  • Add not_ready response
  • Add pagination
  • Add lastUpdatedAt
  • Add scoreVersion
  • Add empty/error UI states

Phase 3: Scheduled score refresh

Goal: calculate country scores outside page requests.

Tasks:

  • Extract leaderboard refresh service
  • Add country user fetching
  • Normalize usernames
  • Score users in batches
  • Store results in cache
  • Track failed users
  • Add deterministic sorting
  • Add refresh metadata
  • Avoid replacing good cache with failed partial data

Phase 4: Production hardening

Goal: make the leaderboard safe and reliable in production.

Tasks:

  • Add rate limiting
  • Add concurrency control
  • Add refresh locks
  • Add retry logic
  • Add logs
  • Add admin/manual refresh command
  • Add tests for cache, pagination, failures, and stale data

Open questions

We still need to decide:

  1. Where should cached leaderboard data be stored?

    • Redis
    • database
    • JSON files generated by scheduled jobs
    • another storage option
  2. How often should leaderboards refresh?

    • daily
    • weekly
    • manually
    • based on country size
  3. Should all countries refresh equally?

    • large countries may need slower/batched refreshes
    • smaller countries can refresh faster
  4. Should the leaderboard show all users or only top users?

    • top 25
    • top 50
    • top 100
    • paginated full list
  5. Should failed users be visible in admin/debug output only, or also exposed in API metadata?


Final direction

The leaderboard should be built around this principle:

Do expensive GitHub scoring in the background.
Serve leaderboard pages from cache.
Keep public API requests fast and safe.

This gives DevImpact a much better production foundation and avoids turning the leaderboard into an expensive live GitHub API workload.

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet
1 participant

AltStyle によって変換されたページ (->オリジナル) /