-
Notifications
You must be signed in to change notification settings - Fork 892
wip #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
wip #487
Changes from 1 commit
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
0ff524a
wip
cyclotruc 369591a
add cursor rule
NicolasIRAGNE 6567388
wip
NicolasIRAGNE ef53573
wip context algorithm
NicolasIRAGNE 94c50ff
wip context algorithm
NicolasIRAGNE 0fc97c3
fix file count
NicolasIRAGNE a69cde3
set port to 8000 sorry rom2
NicolasIRAGNE c3edd62
fix file selection thing now showing up when accessing directly throu...
NicolasIRAGNE fdfe56c
remove flash
NicolasIRAGNE File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
11 changes: 8 additions & 3 deletions
.env.example
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
246 changes: 246 additions & 0 deletions
AI_REDESIGN_SUMMARY.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,246 @@ | ||
# Gitingest AI-Powered Redesign Summary | ||
|
||
## Overview | ||
|
||
Successfully redesigned the Gitingest application to use AI-powered file selection with Google Gemini, replacing manual pattern and size configuration with intelligent, context-aware file selection. | ||
|
||
## 🚀 Key Features Implemented | ||
|
||
### 1. AI-Powered File Selection | ||
- **Google Gemini Integration**: Uses Gemini 1.5 Pro for intelligent file analysis | ||
- **Context-Aware Selection**: AI analyzes repository structure and selects most relevant files | ||
- **User Prompt Guidance**: Optional user prompts to guide file selection (e.g., "API endpoints", "frontend components") | ||
- **Fallback System**: Heuristic-based fallback when AI is unavailable | ||
|
||
### 2. New User Interface | ||
- **Simplified Form**: Removed complex pattern selectors and file size sliders | ||
- **Prompt Input**: Optional textarea for user requirements | ||
- **Context Size Selector**: Choose from 32k, 128k, 512k, or 1M token limits | ||
- **AI Branding**: Clear indication of AI-powered functionality | ||
- **Enhanced Results**: Shows AI selection reasoning and file count | ||
|
||
### 3. Smart Context Management | ||
- **Token-Aware Processing**: Respects context window limits | ||
- **Automatic Cropping**: Content automatically sized to fit selected context | ||
- **Intelligent Sampling**: Uses ~1M token sample for AI analysis | ||
|
||
## 📋 Implementation Details | ||
|
||
### Backend Changes | ||
|
||
#### New Models (`server/models.py`) | ||
```python | ||
class ContextSize(str, Enum): | ||
SMALL = "32k" # ~32k tokens | ||
MEDIUM = "128k" # ~128k tokens | ||
LARGE = "512k" # ~512k tokens | ||
XLARGE = "1M" # ~1M tokens | ||
|
||
class IngestRequest(BaseModel): | ||
input_text: str | ||
context_size: ContextSize = ContextSize.MEDIUM | ||
user_prompt: str = "" | ||
token: str | None = None | ||
``` | ||
|
||
#### AI File Selector (`server/ai_file_selector.py`) | ||
- **Gemini Integration**: Uses `google-generativeai` library | ||
- **File Analysis**: Creates hierarchical file summaries with content previews | ||
- **Intelligent Prompting**: Generates context-aware prompts for optimal selection | ||
- **Error Handling**: Graceful fallback when AI fails | ||
|
||
#### AI Ingestion Flow (`server/ai_ingestion.py`) | ||
- **Two-Phase Process**: Initial scan → AI selection → filtered ingest | ||
- **Context Window Management**: Automatic content cropping to fit limits | ||
- **Metadata Enhancement**: Enriches summaries with AI selection info | ||
|
||
### Frontend Changes | ||
|
||
#### New UI Components | ||
- **AI Form Template** (`templates/components/git_form_ai.jinja`): Modern, simplified interface | ||
- **Enhanced Results** (`templates/components/result_ai.jinja`): Shows AI selection details | ||
- **Context Selector**: Intuitive dropdown with token count estimates | ||
|
||
#### JavaScript Enhancements | ||
- **AI Utilities** (`static/js/utils_ai.js`): Specialized handlers for AI flow | ||
- **Form Validation**: Smart validation for AI-specific fields | ||
- **Loading States**: AI-themed loading indicators and messages | ||
- **Error Handling**: Detailed error reporting for AI failures | ||
|
||
### Configuration | ||
|
||
#### New Dependencies | ||
```toml | ||
# Added to pyproject.toml | ||
"google-generativeai>=0.8.0", # Google Gemini API | ||
``` | ||
|
||
#### Environment Variables | ||
```bash | ||
# Required for AI features | ||
GEMINI_API_KEY=your_api_key_here | ||
``` | ||
|
||
## 🔄 New Workflow | ||
|
||
### 1. Initial Repository Analysis | ||
``` | ||
User Input → Repository Cloning → Full File Tree Generation | ||
``` | ||
|
||
### 2. AI-Powered Selection | ||
``` | ||
File Tree + User Prompt → Gemini API → Selected File Paths + Reasoning | ||
``` | ||
|
||
### 3. Optimized Processing | ||
``` | ||
Selected Files → Content Generation → Context Window Cropping → Final Output | ||
``` | ||
|
||
## 📊 Context Size Options | ||
|
||
| Size | Token Count | Equivalent Pages | Use Case | | ||
|------|-------------|------------------|----------| | ||
| 32k | ~32,000 | ~25 pages | Focused analysis | | ||
| 128k | ~128,000 | ~100 pages | Balanced overview | | ||
| 512k | ~512,000 | ~400 pages | Comprehensive analysis | | ||
| 1M | ~1,000,000 | ~800 pages | Deep dive | | ||
|
||
## 🛡️ Error Handling & Fallbacks | ||
|
||
### AI Failure Scenarios | ||
1. **API Key Missing**: Graceful degradation to heuristic selection | ||
2. **API Rate Limits**: Clear error messages with retry suggestions | ||
3. **Network Issues**: Timeout handling with fallback options | ||
4. **Invalid Responses**: JSON parsing with error recovery | ||
|
||
### Fallback File Selection | ||
When AI is unavailable, the system uses intelligent heuristics: | ||
- **Priority Files**: README, main.*, index.*, config files | ||
- **File Type Filtering**: Focus on code files (.py, .js, .ts, .java, etc.) | ||
- **Size-Based Limits**: Adaptive limits based on context size | ||
|
||
## 🎨 UI/UX Improvements | ||
|
||
### Visual Enhancements | ||
- **AI Branding**: Robot emojis and "AI-Powered" messaging | ||
- **Progress Indicators**: Specialized loading states for AI processing | ||
- **Results Display**: Collapsible file lists and selection reasoning | ||
- **Responsive Design**: Mobile-friendly layout adjustments | ||
|
||
### User Experience | ||
- **Simplified Workflow**: From 5+ form fields to 3 essential inputs | ||
- **Smart Defaults**: Reasonable defaults for all fields | ||
- **Contextual Help**: Tooltips and examples for guidance | ||
- **Success Feedback**: Clear indication of AI analysis completion | ||
|
||
## 🔧 Technical Architecture | ||
|
||
### File Selection Algorithm | ||
```python | ||
def ai_select_files(repository_structure, user_prompt, context_size): | ||
# 1. Create hierarchical file summary with content previews | ||
file_summary = create_file_summary(repository_structure) | ||
|
||
# 2. Generate AI prompt with context | ||
prompt = create_selection_prompt(file_summary, user_prompt, context_size) | ||
|
||
# 3. Query Gemini API | ||
response = gemini_model.generate_content(prompt) | ||
|
||
# 4. Parse and validate response | ||
return parse_file_selection(response) | ||
``` | ||
|
||
### Context Window Management | ||
```python | ||
def crop_to_context_window(content, context_size): | ||
tokens = tokenize(content) | ||
limit = get_token_limit(context_size) | ||
|
||
if len(tokens) <= limit: | ||
return content | ||
|
||
return decode_tokens(tokens[:limit]) + "\n[Content truncated]" | ||
``` | ||
|
||
## 📈 Performance Optimizations | ||
|
||
### Efficient Processing | ||
- **Parallel Operations**: Concurrent file reading and AI analysis | ||
- **Smart Sampling**: Limited content preview for AI analysis | ||
- **Caching Ready**: Structure supports future prompt-based caching | ||
- **Resource Limits**: Bounded memory usage for large repositories | ||
|
||
### API Efficiency | ||
- **Single API Call**: One Gemini request per analysis | ||
- **Optimized Prompts**: Minimal token usage in prompts | ||
- **Error Recovery**: Fast fallback without retry delays | ||
|
||
## 🚦 Migration Path | ||
|
||
### Backward Compatibility | ||
- **Old Endpoints**: Still functional for existing integrations | ||
- **Gradual Rollout**: New UI can be toggled via configuration | ||
- **API Versioning**: V1 endpoints preserved, V2 with AI features | ||
|
||
### Deployment Steps | ||
1. **Install Dependencies**: `pip install google-generativeai` | ||
2. **Set API Key**: Configure `GEMINI_API_KEY` environment variable | ||
3. **Update Templates**: Use new AI-powered form components | ||
4. **Test Integration**: Verify AI functionality with sample repositories | ||
|
||
## 🔍 Monitoring & Observability | ||
|
||
### AI-Specific Metrics | ||
- **Selection Success Rate**: Track AI vs fallback usage | ||
- **Response Quality**: Monitor file selection accuracy | ||
- **Performance Metrics**: AI response times and token usage | ||
- **Error Tracking**: Detailed AI failure categorization | ||
|
||
### Enhanced Logging | ||
```python | ||
logger.info("AI file selection completed", extra={ | ||
"selected_files_count": len(selected_files), | ||
"reasoning_length": len(reasoning), | ||
"context_size": context_size, | ||
"user_prompt": user_prompt[:100] | ||
}) | ||
``` | ||
|
||
## 🎯 Benefits Achieved | ||
|
||
### For Users | ||
- **Simplified Interface**: 70% reduction in form complexity | ||
- **Intelligent Results**: AI selects most relevant files automatically | ||
- **Context Awareness**: Output tailored to specific use cases | ||
- **Better Quality**: More focused, useful digests | ||
|
||
### For Developers | ||
- **Modern Architecture**: Clean separation of concerns | ||
- **Extensible Design**: Easy to add new AI providers | ||
- **Robust Error Handling**: Graceful degradation patterns | ||
- **Type Safety**: Full TypeScript-style typing with Pydantic | ||
|
||
### For Operations | ||
- **Scalable Design**: Stateless AI operations | ||
- **Monitoring Ready**: Comprehensive metrics and logging | ||
- **Configuration Driven**: Environment-based feature toggles | ||
- **Security Focused**: API key management and input validation | ||
|
||
## 🚀 Future Enhancements | ||
|
||
### Planned Features | ||
- **Multi-Provider Support**: Add Claude, GPT-4 as AI alternatives | ||
- **Prompt Templates**: Pre-built prompts for common use cases | ||
- **Smart Caching**: Cache AI selections for identical repositories | ||
- **User Feedback Loop**: Learn from user corrections to improve selection | ||
|
||
### Technical Improvements | ||
- **Streaming Responses**: Real-time AI analysis updates | ||
- **Batch Processing**: Handle multiple repositories efficiently | ||
- **Advanced Filtering**: ML-based content quality scoring | ||
- **Custom Fine-tuning**: Domain-specific AI model training | ||
|
||
This redesign successfully transforms Gitingest from a manual configuration tool into an intelligent, AI-powered codebase analysis platform that automatically selects the most relevant files based on user intent and context requirements. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.