1
0
Fork
You've already forked ilfs
0
No description
  • Python 82.8%
  • Makefile 17.2%
Christoph Görn 8ac8508493
... might work
Signed-off-by: Christoph Görn <goern@b4mad.net>
2025年11月10日 20:30:03 +01:00
src/ilfs ... might work 2025年11月10日 20:30:03 +01:00
.gitignore ... might work 2025年11月10日 20:30:03 +01:00
get_contributors.py ... might work 2025年11月10日 20:30:03 +01:00
LICENSE ... might work 2025年11月10日 20:30:03 +01:00
linkedin_finder.py ... might work 2025年11月10日 20:30:03 +01:00
Makefile ... might work 2025年11月10日 20:30:03 +01:00
pyproject.toml ... might work 2025年11月10日 20:30:03 +01:00
README.md ... might work 2025年11月10日 20:30:03 +01:00
top_contributors.py ... might work 2025年11月10日 20:30:03 +01:00
uv.lock ... might work 2025年11月10日 20:30:03 +01:00

ILFS - InnerSource Leaderboard and Finder Suite

A collection of Python tools for analyzing GitHub contributors and finding them on LinkedIn.

Tools

1. 📊 GitHub Contributors Fetcher (get_contributors.py)

Fetches all contributors from all repositories of a GitHub user or organization.

Features

  • 📊 Fetches contributors from all repositories of a user or organization
  • 🔢 Aggregates total contributions per contributor across all repos
  • 🍴 Optional inclusion/exclusion of forked repositories
  • 🔐 Supports GitHub API token for higher rate limits
  • 📝 Outputs detailed JSON with contributor statistics
  • 🎯 Sorts contributors by total contributions

Requirements

  • Python 3.6+
  • No external dependencies (uses only standard library)

Usage

Basic Usage:

Get contributors from a user's repositories:

./get_contributors.py --owner octocat --type user

Get contributors from an organization:

./get_contributors.py --owner github --type org

With GitHub Token (Recommended):

To avoid rate limiting, use a GitHub personal access token:

./get_contributors.py --owner octocat --token YOUR_GITHUB_TOKEN

To create a token:

  1. Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
  2. Generate new token with public_repo scope (or repo for private repos)

Additional Options:

Include forked repositories:

./get_contributors.py --owner octocat --include-forks

Pretty print the JSON output:

./get_contributors.py --owner octocat --pretty

Save to a file:

./get_contributors.py --owner octocat -o contributors.json --pretty

Complete Example:

./get_contributors.py \
 --owner anthropics \
 --type org \
 --token ghp_YOUR_TOKEN_HERE \
 --include-forks \
 --pretty \
 -o anthropics_contributors.json

Output Format

The script outputs JSON with the following structure:

{
 "owner": "octocat",
 "owner_type": "user",
 "total_repositories": 25,
 "total_unique_contributors": 42,
 "contributors": [
 {
 "login": "octocat",
 "total_contributions": 1234,
 "repositories": [
 {
 "name": "Hello-World",
 "contributions": 456,
 "is_fork": false
 }
 ],
 "avatar_url": "https://avatars.githubusercontent.com/u/583231",
 "html_url": "https://github.com/octocat",
 "type": "User"
 }
 ]
}

Rate Limits

  • Without token: 60 requests per hour
  • With token: 5000 requests per hour

For large organizations with many repositories, using a token is highly recommended.

Command-line Options

--owner OWNER GitHub username or organization name (required)
--type {user,org} Type of owner: user or org (default: user)
--token TOKEN GitHub personal access token
--include-forks Include forked repositories
-o, --output FILE Output file path (default: stdout)
--pretty Pretty print JSON output

2. 🏆 Top Contributors Generator (top_contributors.py)

Analyzes contributors from the generated JSON and creates a top 10 ranking.

Usage

./top_contributors.py

This reads contributors.json and generates top10.json with the top 10 contributors ranked by total contributions.

Output Format

{
 "top_contributors": [
 {
 "rank": 1,
 "login": "username",
 "total_contributions": 1000,
 "repository_count": 5,
 "type": "User",
 "html_url": "https://github.com/username",
 "avatar_url": "https://avatars.githubusercontent.com/u/123456"
 }
 ]
}

3. 🔗 LinkedIn Finder (linkedin_finder.py)

Searches for people on LinkedIn based on their GitHub profiles. Can process either a single user or batch process contributors from a JSON file.

Features

  • 🔍 Fetches real names from GitHub profiles
  • 🔗 Generates LinkedIn search URLs
  • 📦 Batch processing from JSON files
  • ⏱️ Configurable delay between requests
  • 💾 Saves results to JSON files in dump/ directory
  • 🎯 Can skip anonymous contributors

Installation

pip install -r requirements_linkedin.txt

Dependencies:

  • requests: HTTP library
  • beautifulsoup4: HTML parsing
  • lxml: XML/HTML parser

Usage

Process all contributors from top10.json:

./linkedin_finder.py

Process with custom JSON file:

./linkedin_finder.py --json path/to/contributors.json

Process a single GitHub user:

./linkedin_finder.py --username torvalds

Skip anonymous contributors:

./linkedin_finder.py --skip-anonymous

Adjust delay between requests:

./linkedin_finder.py --delay 5

Command-line Options

  • --json PATH: Path to JSON file with contributors (default: top10.json)
  • --username USERNAME: Process a single GitHub username
  • --delay SECONDS: Delay in seconds between requests (default: 2)
  • --skip-anonymous: Skip anonymous contributors from the JSON file

Input Format

The JSON file should have this structure (compatible with output from top_contributors.py):

{
 "top_contributors": [
 {
 "rank": 1,
 "login": "github-username",
 "total_contributions": 1000,
 "type": "User",
 "html_url": "https://github.com/username"
 }
 ]
}

Output

Results are saved in the dump/ directory with filenames like:

username_YYYYMMDD_HHMMSS.json

Each output file contains:

  • GitHub username
  • Full name from GitHub profile
  • LinkedIn search URL
  • Search status
  • Timestamp
  • Rank (if available)

Example Output

{
 "github_username": "torvalds",
 "full_name": "Linus Torvalds",
 "linkedin_search": {
 "name": "Linus Torvalds",
 "search_url": "https://www.linkedin.com/search/results/people/?keywords=Linus%20Torvalds",
 "timestamp": "2025年11月10日T20:01:03.195133",
 "status": "authentication_required",
 "note": "LinkedIn requires login to view results",
 "http_status": "200",
 "content_length": "54313"
 },
 "created_at": "2025年11月10日T20:01:03.573033"
}

LinkedIn Limitations

LinkedIn heavily restricts web scraping and requires authentication for most searches. This script will:

  • Generate valid LinkedIn search URLs
  • Attempt to fetch the search page
  • Note if authentication is required (very likely)

For actual profile data, you would need to:

  • Manually open the URLs in a browser (logged into LinkedIn)
  • Use LinkedIn's official API
  • Use a service with proper authentication

Rate Limiting

Be respectful of GitHub and LinkedIn's servers:

  • Use appropriate delays between requests (default: 2 seconds)
  • Don't run the script too frequently
  • Consider the terms of service for both platforms

Complete Workflow

Here's how to use all three tools together:

Step 1: Fetch Contributors

# Fetch all contributors from an organization
./get_contributors.py \
 --owner your-org-name \
 --type org \
 --token ghp_YOUR_TOKEN \
 --pretty \
 -o contributors.json

Step 2: Generate Top 10

# Generate top 10 contributors
./top_contributors.py

This creates top10.json with the top contributors.

Step 3: Find on LinkedIn

# Search for all top contributors on LinkedIn
./linkedin_finder.py --skip-anonymous
# Or search for a specific user
./linkedin_finder.py --username specific-username

Results will be saved in the dump/ directory.


Troubleshooting

GitHub API Rate Limit Exceeded

If you see rate limit errors, use a GitHub token with --token.

404 Not Found

  • Verify the owner name is correct
  • Ensure you're using the correct type (user vs org)
  • Private repositories require a token with appropriate permissions

Slow Performance

For organizations with many repositories, the scripts may take several minutes. Progress is shown in stderr.

LinkedIn Authentication Required

This is expected behavior. LinkedIn requires login for detailed search results. Use the generated URLs to manually search while logged into LinkedIn.


Quick Analysis Examples

Using jq with contributors data

# Get top 10 contributors
./get_contributors.py --owner octocat | jq '.contributors[:10]'
# Count contributors
./get_contributors.py --owner octocat | jq '.total_unique_contributors'
# List all contributor logins
./get_contributors.py --owner octocat | jq '.contributors[].login'

Project Structure

ilfs/
├── get_contributors.py # Fetch GitHub contributors
├── top_contributors.py # Generate top 10 ranking
├── linkedin_finder.py # Find people on LinkedIn
├── requirements_linkedin.txt # Python dependencies for LinkedIn finder
├── contributors.json # Full contributors data (generated)
├── top10.json # Top 10 contributors (generated)
├── dump/ # LinkedIn search results (generated)
└── README.md # This file

License

This project is provided as-is for educational and research purposes. Please respect the terms of service of GitHub and LinkedIn when using these tools.