|
|
||
|---|---|---|
| src/ilfs | ... might work | |
| .gitignore | ... might work | |
| get_contributors.py | ... might work | |
| LICENSE | ... might work | |
| linkedin_finder.py | ... might work | |
| Makefile | ... might work | |
| pyproject.toml | ... might work | |
| README.md | ... might work | |
| top_contributors.py | ... might work | |
| uv.lock | ... might work | |
ILFS - InnerSource Leaderboard and Finder Suite
A collection of Python tools for analyzing GitHub contributors and finding them on LinkedIn.
Tools
1. 📊 GitHub Contributors Fetcher (get_contributors.py)
Fetches all contributors from all repositories of a GitHub user or organization.
Features
- 📊 Fetches contributors from all repositories of a user or organization
- 🔢 Aggregates total contributions per contributor across all repos
- 🍴 Optional inclusion/exclusion of forked repositories
- 🔐 Supports GitHub API token for higher rate limits
- 📝 Outputs detailed JSON with contributor statistics
- 🎯 Sorts contributors by total contributions
Requirements
- Python 3.6+
- No external dependencies (uses only standard library)
Usage
Basic Usage:
Get contributors from a user's repositories:
./get_contributors.py --owner octocat --type user
Get contributors from an organization:
./get_contributors.py --owner github --type org
With GitHub Token (Recommended):
To avoid rate limiting, use a GitHub personal access token:
./get_contributors.py --owner octocat --token YOUR_GITHUB_TOKEN
To create a token:
- Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
- Generate new token with
public_reposcope (orrepofor private repos)
Additional Options:
Include forked repositories:
./get_contributors.py --owner octocat --include-forks
Pretty print the JSON output:
./get_contributors.py --owner octocat --pretty
Save to a file:
./get_contributors.py --owner octocat -o contributors.json --pretty
Complete Example:
./get_contributors.py \
--owner anthropics \
--type org \
--token ghp_YOUR_TOKEN_HERE \
--include-forks \
--pretty \
-o anthropics_contributors.json
Output Format
The script outputs JSON with the following structure:
{
"owner": "octocat",
"owner_type": "user",
"total_repositories": 25,
"total_unique_contributors": 42,
"contributors": [
{
"login": "octocat",
"total_contributions": 1234,
"repositories": [
{
"name": "Hello-World",
"contributions": 456,
"is_fork": false
}
],
"avatar_url": "https://avatars.githubusercontent.com/u/583231",
"html_url": "https://github.com/octocat",
"type": "User"
}
]
}
Rate Limits
- Without token: 60 requests per hour
- With token: 5000 requests per hour
For large organizations with many repositories, using a token is highly recommended.
Command-line Options
--owner OWNER GitHub username or organization name (required)
--type {user,org} Type of owner: user or org (default: user)
--token TOKEN GitHub personal access token
--include-forks Include forked repositories
-o, --output FILE Output file path (default: stdout)
--pretty Pretty print JSON output
2. 🏆 Top Contributors Generator (top_contributors.py)
Analyzes contributors from the generated JSON and creates a top 10 ranking.
Usage
./top_contributors.py
This reads contributors.json and generates top10.json with the top 10 contributors ranked by total contributions.
Output Format
{
"top_contributors": [
{
"rank": 1,
"login": "username",
"total_contributions": 1000,
"repository_count": 5,
"type": "User",
"html_url": "https://github.com/username",
"avatar_url": "https://avatars.githubusercontent.com/u/123456"
}
]
}
3. 🔗 LinkedIn Finder (linkedin_finder.py)
Searches for people on LinkedIn based on their GitHub profiles. Can process either a single user or batch process contributors from a JSON file.
Features
- 🔍 Fetches real names from GitHub profiles
- 🔗 Generates LinkedIn search URLs
- 📦 Batch processing from JSON files
- ⏱️ Configurable delay between requests
- 💾 Saves results to JSON files in
dump/directory - 🎯 Can skip anonymous contributors
Installation
pip install -r requirements_linkedin.txt
Dependencies:
requests: HTTP librarybeautifulsoup4: HTML parsinglxml: XML/HTML parser
Usage
Process all contributors from top10.json:
./linkedin_finder.py
Process with custom JSON file:
./linkedin_finder.py --json path/to/contributors.json
Process a single GitHub user:
./linkedin_finder.py --username torvalds
Skip anonymous contributors:
./linkedin_finder.py --skip-anonymous
Adjust delay between requests:
./linkedin_finder.py --delay 5
Command-line Options
--json PATH: Path to JSON file with contributors (default:top10.json)--username USERNAME: Process a single GitHub username--delay SECONDS: Delay in seconds between requests (default: 2)--skip-anonymous: Skip anonymous contributors from the JSON file
Input Format
The JSON file should have this structure (compatible with output from top_contributors.py):
{
"top_contributors": [
{
"rank": 1,
"login": "github-username",
"total_contributions": 1000,
"type": "User",
"html_url": "https://github.com/username"
}
]
}
Output
Results are saved in the dump/ directory with filenames like:
username_YYYYMMDD_HHMMSS.json
Each output file contains:
- GitHub username
- Full name from GitHub profile
- LinkedIn search URL
- Search status
- Timestamp
- Rank (if available)
Example Output
{
"github_username": "torvalds",
"full_name": "Linus Torvalds",
"linkedin_search": {
"name": "Linus Torvalds",
"search_url": "https://www.linkedin.com/search/results/people/?keywords=Linus%20Torvalds",
"timestamp": "2025年11月10日T20:01:03.195133",
"status": "authentication_required",
"note": "LinkedIn requires login to view results",
"http_status": "200",
"content_length": "54313"
},
"created_at": "2025年11月10日T20:01:03.573033"
}
LinkedIn Limitations
LinkedIn heavily restricts web scraping and requires authentication for most searches. This script will:
- Generate valid LinkedIn search URLs
- Attempt to fetch the search page
- Note if authentication is required (very likely)
For actual profile data, you would need to:
- Manually open the URLs in a browser (logged into LinkedIn)
- Use LinkedIn's official API
- Use a service with proper authentication
Rate Limiting
Be respectful of GitHub and LinkedIn's servers:
- Use appropriate delays between requests (default: 2 seconds)
- Don't run the script too frequently
- Consider the terms of service for both platforms
Complete Workflow
Here's how to use all three tools together:
Step 1: Fetch Contributors
# Fetch all contributors from an organization
./get_contributors.py \
--owner your-org-name \
--type org \
--token ghp_YOUR_TOKEN \
--pretty \
-o contributors.json
Step 2: Generate Top 10
# Generate top 10 contributors
./top_contributors.py
This creates top10.json with the top contributors.
Step 3: Find on LinkedIn
# Search for all top contributors on LinkedIn
./linkedin_finder.py --skip-anonymous
# Or search for a specific user
./linkedin_finder.py --username specific-username
Results will be saved in the dump/ directory.
Troubleshooting
GitHub API Rate Limit Exceeded
If you see rate limit errors, use a GitHub token with --token.
404 Not Found
- Verify the owner name is correct
- Ensure you're using the correct type (user vs org)
- Private repositories require a token with appropriate permissions
Slow Performance
For organizations with many repositories, the scripts may take several minutes. Progress is shown in stderr.
LinkedIn Authentication Required
This is expected behavior. LinkedIn requires login for detailed search results. Use the generated URLs to manually search while logged into LinkedIn.
Quick Analysis Examples
Using jq with contributors data
# Get top 10 contributors
./get_contributors.py --owner octocat | jq '.contributors[:10]'
# Count contributors
./get_contributors.py --owner octocat | jq '.total_unique_contributors'
# List all contributor logins
./get_contributors.py --owner octocat | jq '.contributors[].login'
Project Structure
ilfs/
├── get_contributors.py # Fetch GitHub contributors
├── top_contributors.py # Generate top 10 ranking
├── linkedin_finder.py # Find people on LinkedIn
├── requirements_linkedin.txt # Python dependencies for LinkedIn finder
├── contributors.json # Full contributors data (generated)
├── top10.json # Top 10 contributors (generated)
├── dump/ # LinkedIn search results (generated)
└── README.md # This file
License
This project is provided as-is for educational and research purposes. Please respect the terms of service of GitHub and LinkedIn when using these tools.