Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 414e851

Browse files
MickaelCafilipchristiansenNicolasIRAGNE
authored
feat: implement S3 integration for storing and retrieving digest files (#427)
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com> Co-authored-by: Nicolas Iragne <nicoragne@hotmail.fr>
1 parent 998cea1 commit 414e851

File tree

17 files changed

+688
-38
lines changed

17 files changed

+688
-38
lines changed

‎.docker/minio/setup.sh

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/bin/sh
2+
3+
# Simple script to set up MinIO bucket and user
4+
# Based on example from MinIO issues
5+
6+
# Format bucket name to ensure compatibility
7+
BUCKET_NAME=$(echo "${S3_BUCKET_NAME}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')
8+
9+
# Configure MinIO client
10+
mc alias set myminio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
11+
12+
# Remove bucket if it exists (for clean setup)
13+
mc rm -r --force myminio/${BUCKET_NAME} || true
14+
15+
# Create bucket
16+
mc mb myminio/${BUCKET_NAME}
17+
18+
# Set bucket policy to allow downloads
19+
mc anonymous set download myminio/${BUCKET_NAME}
20+
21+
# Create user with access and secret keys
22+
mc admin user add myminio ${S3_ACCESS_KEY} ${S3_SECRET_KEY} || echo "User already exists"
23+
24+
# Create policy for the bucket
25+
echo '{"Version":"2012年10月17日","Statement":[{"Effect":"Allow","Action":["s3:*"],"Resource":["arn:aws:s3:::'${BUCKET_NAME}'/*","arn:aws:s3:::'${BUCKET_NAME}'"]}]}' > /tmp/policy.json
26+
27+
# Apply policy
28+
mc admin policy create myminio gitingest-policy /tmp/policy.json || echo "Policy already exists"
29+
mc admin policy attach myminio gitingest-policy --user ${S3_ACCESS_KEY}
30+
31+
echo "MinIO setup completed successfully"
32+
echo "Bucket: ${BUCKET_NAME}"
33+
echo "Access via console: http://localhost:9001"

‎.env.example

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,26 @@ GITINGEST_SENTRY_PROFILE_LIFECYCLE=trace
3333
GITINGEST_SENTRY_SEND_DEFAULT_PII=true
3434
# Environment name for Sentry (default: "")
3535
GITINGEST_SENTRY_ENVIRONMENT=development
36+
37+
# MinIO Configuration (for development)
38+
# Root user credentials for MinIO admin access
39+
MINIO_ROOT_USER=minioadmin
40+
MINIO_ROOT_PASSWORD=minioadmin
41+
42+
# S3 Configuration (for application)
43+
# Set to "true" to enable S3 storage for digests
44+
# S3_ENABLED=true
45+
# Endpoint URL for the S3 service (MinIO in development)
46+
S3_ENDPOINT=http://minio:9000
47+
# Access key for the S3 bucket (created automatically in development)
48+
S3_ACCESS_KEY=gitingest
49+
# Secret key for the S3 bucket (created automatically in development)
50+
S3_SECRET_KEY=gitingest123
51+
# Name of the S3 bucket (created automatically in development)
52+
S3_BUCKET_NAME=gitingest-bucket
53+
# Region for the S3 bucket (default for MinIO)
54+
S3_REGION=us-east-1
55+
# Public URL/CDN for accessing S3 resources
56+
S3_ALIAS_HOST=127.0.0.1:9000/gitingest-bucket
57+
# Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)
58+
# S3_DIRECTORY_PREFIX=my-prefix

‎.pre-commit-config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ repos:
113113
files: ^src/
114114
additional_dependencies:
115115
[
116+
boto3>=1.28.0,
116117
click>=8.0.0,
117118
'fastapi[standard]>=0.109.1',
118119
httpx,
@@ -138,6 +139,7 @@ repos:
138139
- --rcfile=tests/.pylintrc
139140
additional_dependencies:
140141
[
142+
boto3>=1.28.0,
141143
click>=8.0.0,
142144
'fastapi[standard]>=0.109.1',
143145
httpx,

‎README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,8 @@ This is because Jupyter notebooks are asynchronous by default.
204204

205205
## 🐳 Self-host
206206

207+
### Using Docker
208+
207209
1. Build the image:
208210

209211
``` bash
@@ -239,6 +241,89 @@ The application can be configured using the following environment variables:
239241
- **GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE**: Sampling rate for profile sessions (default: "1.0", range: 0.0-1.0)
240242
- **GITINGEST_SENTRY_PROFILE_LIFECYCLE**: Profile lifecycle mode (default: "trace")
241243
- **GITINGEST_SENTRY_SEND_DEFAULT_PII**: Send default personally identifiable information (default: "true")
244+
- **S3_ALIAS_HOST**: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")
245+
- **S3_DIRECTORY_PREFIX**: Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)
246+
247+
### Using Docker Compose
248+
249+
The project includes a `compose.yml` file that allows you to easily run the application in both development and production environments.
250+
251+
#### Compose File Structure
252+
253+
The `compose.yml` file uses YAML anchoring with `&app-base` and `<<: *app-base` to define common configuration that is shared between services:
254+
255+
```yaml
256+
# Common base configuration for all services
257+
x-app-base: &app-base
258+
build:
259+
context: .
260+
dockerfile: Dockerfile
261+
ports:
262+
- "${APP_WEB_BIND:-8000}:8000" # Main application port
263+
- "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090" # Metrics port
264+
# ... other common configurations
265+
```
266+
267+
#### Services
268+
269+
The file defines three services:
270+
271+
1. **app**: Production service configuration
272+
- Uses the `prod` profile
273+
- Sets the Sentry environment to "production"
274+
- Configured for stable operation with `restart: unless-stopped`
275+
276+
2. **app-dev**: Development service configuration
277+
- Uses the `dev` profile
278+
- Enables debug mode
279+
- Mounts the source code for live development
280+
- Uses hot reloading for faster development
281+
282+
3. **minio**: S3-compatible object storage for development
283+
- Uses the `dev` profile (only available in development mode)
284+
- Provides S3-compatible storage for local development
285+
- Accessible via:
286+
- API: Port 9000 ([localhost:9000](http://localhost:9000))
287+
- Web Console: Port 9001 ([localhost:9001](http://localhost:9001))
288+
- Default admin credentials:
289+
- Username: `minioadmin`
290+
- Password: `minioadmin`
291+
- Configurable via environment variables:
292+
- `MINIO_ROOT_USER`: Custom admin username (default: minioadmin)
293+
- `MINIO_ROOT_PASSWORD`: Custom admin password (default: minioadmin)
294+
- Includes persistent storage via Docker volume
295+
- Auto-creates a bucket and application-specific credentials:
296+
- Bucket name: `gitingest-bucket` (configurable via `S3_BUCKET_NAME`)
297+
- Access key: `gitingest` (configurable via `S3_ACCESS_KEY`)
298+
- Secret key: `gitingest123` (configurable via `S3_SECRET_KEY`)
299+
- These credentials are automatically passed to the app-dev service via environment variables:
300+
- `S3_ENDPOINT`: URL of the MinIO server
301+
- `S3_ACCESS_KEY`: Access key for the S3 bucket
302+
- `S3_SECRET_KEY`: Secret key for the S3 bucket
303+
- `S3_BUCKET_NAME`: Name of the S3 bucket
304+
- `S3_REGION`: Region for the S3 bucket (default: us-east-1)
305+
- `S3_ALIAS_HOST`: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")
306+
307+
#### Usage Examples
308+
309+
To run the application in development mode:
310+
311+
```bash
312+
docker compose --profile dev up
313+
```
314+
315+
To run the application in production mode:
316+
317+
```bash
318+
docker compose --profile prod up -d
319+
```
320+
321+
To build and run the application:
322+
323+
```bash
324+
docker compose --profile prod build
325+
docker compose --profile prod up -d
326+
```
242327

243328
## 🤝 Contributing
244329

‎compose.yml

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Common base configuration for all services
2+
x-app-base: &app-base
3+
ports:
4+
- "${APP_WEB_BIND:-8000}:8000" # Main application port
5+
- "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090" # Metrics port
6+
environment:
7+
# Python Configuration
8+
- PYTHONUNBUFFERED=1
9+
- PYTHONDONTWRITEBYTECODE=1
10+
# Host Configuration
11+
- ALLOWED_HOSTS=${ALLOWED_HOSTS:-gitingest.com,*.gitingest.com,localhost,127.0.0.1}
12+
# Metrics Configuration
13+
- GITINGEST_METRICS_ENABLED=${GITINGEST_METRICS_ENABLED:-true}
14+
- GITINGEST_METRICS_HOST=${GITINGEST_METRICS_HOST:-127.0.0.1}
15+
- GITINGEST_METRICS_PORT=${GITINGEST_METRICS_PORT:-9090}
16+
# Sentry Configuration
17+
- GITINGEST_SENTRY_ENABLED=${GITINGEST_SENTRY_ENABLED:-false}
18+
- GITINGEST_SENTRY_DSN=${GITINGEST_SENTRY_DSN:-}
19+
- GITINGEST_SENTRY_TRACES_SAMPLE_RATE=${GITINGEST_SENTRY_TRACES_SAMPLE_RATE:-1.0}
20+
- GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE=${GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE:-1.0}
21+
- GITINGEST_SENTRY_PROFILE_LIFECYCLE=${GITINGEST_SENTRY_PROFILE_LIFECYCLE:-trace}
22+
- GITINGEST_SENTRY_SEND_DEFAULT_PII=${GITINGEST_SENTRY_SEND_DEFAULT_PII:-true}
23+
user: "1000:1000"
24+
command: ["python", "-m", "uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "8000"]
25+
26+
services:
27+
# Production service configuration
28+
app:
29+
<<: *app-base
30+
image: ghcr.io/coderamp-labs/gitingest:latest
31+
profiles:
32+
- prod
33+
environment:
34+
- GITINGEST_SENTRY_ENVIRONMENT=${GITINGEST_SENTRY_ENVIRONMENT:-production}
35+
restart: unless-stopped
36+
37+
# Development service configuration
38+
app-dev:
39+
<<: *app-base
40+
build:
41+
context: .
42+
dockerfile: Dockerfile
43+
profiles:
44+
- dev
45+
environment:
46+
- DEBUG=true
47+
- GITINGEST_SENTRY_ENVIRONMENT=${GITINGEST_SENTRY_ENVIRONMENT:-development}
48+
# S3 Configuration
49+
- S3_ENABLED=true
50+
- S3_ENDPOINT=http://minio:9000
51+
- S3_ACCESS_KEY=${S3_ACCESS_KEY:-gitingest}
52+
- S3_SECRET_KEY=${S3_SECRET_KEY:-gitingest123}
53+
# Use lowercase bucket name to ensure compatibility with MinIO
54+
- S3_BUCKET_NAME=${S3_BUCKET_NAME:-gitingest-bucket}
55+
- S3_REGION=${S3_REGION:-us-east-1}
56+
- S3_DIRECTORY_PREFIX=${S3_DIRECTORY_PREFIX:-dev}
57+
# Public URL for S3 resources
58+
- S3_ALIAS_HOST=${S3_ALIAS_HOST:-http://127.0.0.1:9000/${S3_BUCKET_NAME:-gitingest-bucket}}
59+
volumes:
60+
# Mount source code for live development
61+
- ./src:/app:ro
62+
# Use --reload flag for hot reloading during development
63+
command: ["python", "-m", "uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
64+
depends_on:
65+
minio-setup:
66+
condition: service_completed_successfully
67+
68+
# MinIO S3-compatible object storage for development
69+
minio:
70+
image: minio/minio:latest
71+
profiles:
72+
- dev
73+
ports:
74+
- "9000:9000" # API port
75+
- "9001:9001" # Console port
76+
environment:
77+
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minioadmin}
78+
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minioadmin}
79+
volumes:
80+
- minio-data:/data
81+
command: server /data --console-address ":9001"
82+
restart: unless-stopped
83+
healthcheck:
84+
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
85+
interval: 30s
86+
timeout: 30s
87+
start_period: 30s
88+
start_interval: 1s
89+
90+
# MinIO setup service to create bucket and user
91+
minio-setup:
92+
image: minio/mc
93+
profiles:
94+
- dev
95+
depends_on:
96+
minio:
97+
condition: service_healthy
98+
environment:
99+
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minioadmin}
100+
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minioadmin}
101+
- S3_ACCESS_KEY=${S3_ACCESS_KEY:-gitingest}
102+
- S3_SECRET_KEY=${S3_SECRET_KEY:-gitingest123}
103+
- S3_BUCKET_NAME=${S3_BUCKET_NAME:-gitingest-bucket}
104+
volumes:
105+
- ./.docker/minio/setup.sh:/setup.sh:ro
106+
entrypoint: sh
107+
command: -c /setup.sh
108+
109+
volumes:
110+
minio-data:
111+
driver: local

‎pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ dev = [
4444
]
4545

4646
server = [
47+
"boto3>=1.28.0", # AWS SDK for S3 support
4748
"fastapi[standard]>=0.109.1", # Minimum safe release (https://osv.dev/vulnerability/PYSEC-2024-38)
4849
"prometheus-client",
4950
"sentry-sdk[fastapi]",

‎requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
boto3>=1.28.0 # AWS SDK for S3 support
12
click>=8.0.0
23
fastapi[standard]>=0.109.1 # Vulnerable to https://osv.dev/vulnerability/PYSEC-2024-38
34
httpx

‎src/gitingest/query_parser.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,9 @@ async def parse_remote_repo(source: str, token: str | None = None) -> IngestionQ
4444
host = parsed_url.netloc
4545
user, repo = _get_user_and_repo_from_path(parsed_url.path)
4646

47-
_id = str(uuid.uuid4())
47+
_id = uuid.uuid4()
4848
slug = f"{user}-{repo}"
49-
local_path = TMP_BASE_PATH / _id / slug
49+
local_path = TMP_BASE_PATH / str(_id) / slug
5050
url = f"https://{host}/{user}/{repo}"
5151

5252
query = IngestionQuery(
@@ -132,7 +132,7 @@ def parse_local_dir_path(path_str: str) -> IngestionQuery:
132132
"""
133133
path_obj = Path(path_str).resolve()
134134
slug = path_obj.name if path_str == "." else path_str.strip("/")
135-
return IngestionQuery(local_path=path_obj, slug=slug, id=str(uuid.uuid4()))
135+
return IngestionQuery(local_path=path_obj, slug=slug, id=uuid.uuid4())
136136

137137

138138
async def _configure_branch_or_tag(

‎src/gitingest/schemas/ingestion.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from __future__ import annotations
44

55
from pathlib import Path # noqa: TC003 (typing-only-standard-library-import) needed for type checking (pydantic)
6+
from uuid import UUID # noqa: TC003 (typing-only-standard-library-import) needed for type checking (pydantic)
67

78
from pydantic import BaseModel, Field
89

@@ -27,7 +28,7 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
2728
The URL of the repository.
2829
slug : str
2930
The slug of the repository.
30-
id : str
31+
id : UUID
3132
The ID of the repository.
3233
subpath : str
3334
The subpath to the repository or file (default: ``"/"``).
@@ -47,6 +48,8 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
4748
The patterns to include.
4849
include_submodules : bool
4950
Whether to include all Git submodules within the repository. (default: ``False``)
51+
s3_url : str | None
52+
The S3 URL where the digest is stored if S3 is enabled.
5053
5154
"""
5255

@@ -56,7 +59,7 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
5659
local_path: Path
5760
url: str | None = None
5861
slug: str
59-
id: str
62+
id: UUID
6063
subpath: str = Field(default="/")
6164
type: str | None = None
6265
branch: str | None = None
@@ -66,6 +69,7 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
6669
ignore_patterns: set[str] = Field(default_factory=set) # TODO: ssame type for ignore_* and include_* patterns
6770
include_patterns: set[str] | None = None
6871
include_submodules: bool = Field(default=False)
72+
s3_url: str | None = None
6973

7074
def extract_clone_config(self) -> CloneConfig:
7175
"""Extract the relevant fields for the CloneConfig object.

‎src/server/models.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,8 @@ class IngestSuccessResponse(BaseModel):
7171
Short form of repository URL (user/repo).
7272
summary : str
7373
Summary of the ingestion process including token estimates.
74-
ingest_id : str
75-
Ingestion id used to download full context.
74+
digest_url : str
75+
URL to download the full digest content (either S3 URL or local download endpoint).
7676
tree : str
7777
File tree structure of the repository.
7878
content : str
@@ -89,7 +89,7 @@ class IngestSuccessResponse(BaseModel):
8989
repo_url: str = Field(..., description="Original repository URL")
9090
short_repo_url: str = Field(..., description="Short repository URL (user/repo)")
9191
summary: str = Field(..., description="Ingestion summary with token estimates")
92-
ingest_id: str = Field(..., description="Ingestion id used to download full context")
92+
digest_url: str = Field(..., description="URL to download the full digest content")
9393
tree: str = Field(..., description="File tree structure")
9494
content: str = Field(..., description="Processed file content")
9595
default_max_file_size: int = Field(..., description="File size slider position used")

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /