Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit adda1ce

Browse files
filipchristiansenCheelax
andcommitted
feat(config, cli): enhance config options, support env vars
Closes #285 * add support for environment variables (`GITINGEST_*`) to override `config.py` defaults * implement a precedence hierarchy: CLI/Python args → environment variables → default values * introduce new CLI options (`--max-files`, `--max-total-size`, `--max-directory-depth`). * centralise environment variable utilities in `utils/config_utils.py` with functions `_get_str_env_var` and `_get_int_env_var` * add configuration examples to `README.md` * tidy and update docstrings * update tests * add missing `--tag` CLI flag * remove `isort` in favour of `ruff.lint.isort` * remove unused constants `BASE_DIR` and `TEMPLATE_DIR` in `tests/server/test_flow_integration.py` * rename constant `templates` to `JINJA_TEMPLATES` in `src/server/server_config.py` * move `Colors` from `src/server/server_utils.py` to `src/gitingest/utils/colors.py` to break circular import chain Co-authored-by: Cheelax <thomas.belloc@gmail.com>
1 parent a99089a commit adda1ce

19 files changed

+307
-130
lines changed

‎.pre-commit-config.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,6 @@ repos:
5858
- id: python-use-type-annotations
5959
description: 'Enforce that python3.6+ type annotations are used instead of type comments.'
6060

61-
- repo: https://github.com/PyCQA/isort
62-
rev: 6.0.1
63-
hooks:
64-
- id: isort
65-
description: 'Sort imports alphabetically, and automatically separated into sections and by type.'
66-
6761
- repo: https://github.com/pre-commit/mirrors-eslint
6862
rev: v9.30.1
6963
hooks:

‎README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,12 +144,60 @@ By default, the digest is written to a text file (`digest.txt`) in your current
144144
- Use `--output/-o <filename>` to write to a specific file.
145145
- Use `--output/-o -` to output directly to `STDOUT` (useful for piping to other tools).
146146

147+
### 🔧 Configure processing limits
148+
149+
```bash
150+
# Set higher limits for large repositories
151+
gitingest https://github.com/torvalds/linux \
152+
--max-files 100000 \
153+
--max-total-size 2147483648 \
154+
--max-directory-depth 25
155+
156+
# Process only Python files up to 1MB each
157+
gitingest /path/to/project \
158+
--include-pattern "*.py" \
159+
--max-size 1048576 \
160+
--max-files 1000
161+
```
162+
147163
See more options and usage details with:
148164

149165
```bash
150166
gitingest --help
151167
```
152168

169+
### Configuration via Environment Variables
170+
171+
You can configure various limits and settings using environment variables. All configuration environment variables start with the `GITINGEST_` prefix:
172+
173+
#### File Processing Configuration
174+
175+
- `GITINGEST_MAX_FILE_SIZE` - Maximum size of a single file to process *(default: 10485760 bytes, 10 MB)*
176+
- `GITINGEST_MAX_FILES` - Maximum number of files to process *(default: 10000)*
177+
- `GITINGEST_MAX_TOTAL_SIZE_BYTES` - Maximum size of output file *(default: 524288000 bytes, 500 MB)*
178+
- `GITINGEST_MAX_DIRECTORY_DEPTH` - Maximum depth of directory traversal *(default: 20)*
179+
- `GITINGEST_DEFAULT_TIMEOUT` - Default operation timeout in seconds *(default: 60)*
180+
- `GITINGEST_OUTPUT_FILE_NAME` - Default output filename *(default: "digest.txt")*
181+
- `GITINGEST_TMP_BASE_PATH` - Base path for temporary files *(default: system temp directory)*
182+
183+
#### Server Configuration (for self-hosting)
184+
185+
- `GITINGEST_MAX_DISPLAY_SIZE` - Maximum size of content to display in UI *(default: 300000 bytes)*
186+
- `GITINGEST_DELETE_REPO_AFTER` - Repository cleanup timeout in seconds *(default: 3600, 1 hour)*
187+
- `GITINGEST_MAX_FILE_SIZE_KB` - Maximum file size for UI slider in kB *(default: 102400, 100 MB)*
188+
- `GITINGEST_MAX_SLIDER_POSITION` - Maximum slider position in UI *(default: 500)*
189+
190+
#### Example usage
191+
192+
```bash
193+
# Configure for large scientific repositories
194+
export GITINGEST_MAX_FILES=50000
195+
export GITINGEST_MAX_FILE_SIZE=20971520 # 20 MB
196+
export GITINGEST_MAX_TOTAL_SIZE_BYTES=1073741824 # 1 GB
197+
198+
gitingest https://github.com/some/large-repo
199+
```
200+
153201
## 🐍 Python package usage
154202

155203
```python
@@ -178,6 +226,15 @@ summary, tree, content = ingest("https://github.com/username/private-repo")
178226

179227
# Include repository submodules
180228
summary, tree, content = ingest("https://github.com/username/repo-with-submodules", include_submodules=True)
229+
230+
# Configure limits programmatically
231+
summary, tree, content = ingest(
232+
"https://github.com/username/large-repo",
233+
max_file_size=20 * 1024 * 1024, # 20 MB per file
234+
max_files=50000, # 50k files max
235+
max_total_size_bytes=1024**2, # 1 MB total
236+
max_directory_depth=30 # 30 levels deep
237+
)
181238
```
182239

183240
By default, this won't write a file but can be enabled with the `output` argument.

‎pyproject.toml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -112,14 +112,6 @@ case-sensitive = true
112112
[tool.pycln]
113113
all = true
114114

115-
# TODO: Remove this once we figure out how to use ruff-isort
116-
[tool.isort]
117-
profile = "black"
118-
line_length = 119
119-
remove_redundant_aliases = true
120-
float_to_top = true # https://github.com/astral-sh/ruff/issues/6514
121-
order_by_type = true
122-
filter_files = true
123115

124116
# Test configuration
125117
[tool.pytest.ini_options]

‎src/gitingest/__main__.py

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,20 @@
99
import click
1010
from typing_extensions import Unpack
1111

12-
from gitingest.config import MAX_FILE_SIZE, OUTPUT_FILE_NAME
12+
from gitingest.config import MAX_DIRECTORY_DEPTH, MAX_FILES, MAX_FILE_SIZE, MAX_TOTAL_SIZE_BYTES, OUTPUT_FILE_NAME
1313
from gitingest.entrypoint import ingest_async
1414

1515

1616
class _CLIArgs(TypedDict):
1717
source: str
1818
max_size: int
19+
max_files: int
20+
max_total_size: int
21+
max_directory_depth: int
1922
exclude_pattern: tuple[str, ...]
2023
include_pattern: tuple[str, ...]
2124
branch: str | None
25+
tag: str | None
2226
include_gitignored: bool
2327
include_submodules: bool
2428
token: str | None
@@ -34,6 +38,24 @@ class _CLIArgs(TypedDict):
3438
show_default=True,
3539
help="Maximum file size to process in bytes",
3640
)
41+
@click.option(
42+
"--max-files",
43+
default=MAX_FILES,
44+
show_default=True,
45+
help="Maximum number of files to process",
46+
)
47+
@click.option(
48+
"--max-total-size",
49+
default=MAX_TOTAL_SIZE_BYTES,
50+
show_default=True,
51+
help="Maximum total size of all files in bytes",
52+
)
53+
@click.option(
54+
"--max-directory-depth",
55+
default=MAX_DIRECTORY_DEPTH,
56+
show_default=True,
57+
help="Maximum depth of directory traversal",
58+
)
3759
@click.option("--exclude-pattern", "-e", multiple=True, help="Shell-style patterns to exclude.")
3860
@click.option(
3961
"--include-pattern",
@@ -42,6 +64,7 @@ class _CLIArgs(TypedDict):
4264
help="Shell-style patterns to include.",
4365
)
4466
@click.option("--branch", "-b", default=None, help="Branch to clone and ingest")
67+
@click.option("--tag", default=None, help="Tag to clone and ingest")
4568
@click.option(
4669
"--include-gitignored",
4770
is_flag=True,
@@ -98,7 +121,7 @@ def main(**cli_kwargs: Unpack[_CLIArgs]) -> None:
98121
$ gitingest --include-pattern "*.js" --exclude-pattern "node_modules/*"
99122
100123
Private repositories:
101-
$ gitingest https://github.com/user/private-repo -t ghp_token
124+
$ gitingest https://github.com/user/private-repo --token ghp_token
102125
$ GITHUB_TOKEN=ghp_token gitingest https://github.com/user/private-repo
103126
104127
Include submodules:
@@ -112,9 +135,13 @@ async def _async_main(
112135
source: str,
113136
*,
114137
max_size: int = MAX_FILE_SIZE,
138+
max_files: int = MAX_FILES,
139+
max_total_size: int = MAX_TOTAL_SIZE_BYTES,
140+
max_directory_depth: int = MAX_DIRECTORY_DEPTH,
115141
exclude_pattern: tuple[str, ...] | None = None,
116142
include_pattern: tuple[str, ...] | None = None,
117143
branch: str | None = None,
144+
tag: str | None = None,
118145
include_gitignored: bool = False,
119146
include_submodules: bool = False,
120147
token: str | None = None,
@@ -132,21 +159,29 @@ async def _async_main(
132159
A directory path or a Git repository URL.
133160
max_size : int
134161
Maximum file size in bytes to ingest (default: 10 MB).
162+
max_files : int
163+
Maximum number of files to ingest (default: 10,000).
164+
max_total_size : int
165+
Maximum total size of output file in bytes (default: 500 MB).
166+
max_directory_depth : int
167+
Maximum depth of directory traversal (default: 20).
135168
exclude_pattern : tuple[str, ...] | None
136169
Glob patterns for pruning the file set.
137170
include_pattern : tuple[str, ...] | None
138171
Glob patterns for including files in the output.
139172
branch : str | None
140-
Git branch to ingest. If ``None``, the repository's default branch is used.
173+
Git branch to clone and ingest (default: the default branch).
174+
tag : str | None
175+
Git tag to clone and ingest. If ``None``, no tag is used.
141176
include_gitignored : bool
142-
If ``True``, also ingest files matched by ``.gitignore`` or ``.gitingestignore`` (default: ``False``).
177+
If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
143178
include_submodules : bool
144179
If ``True``, recursively include all Git submodules within the repository (default: ``False``).
145180
token : str | None
146181
GitHub personal access token (PAT) for accessing private repositories.
147182
Can also be set via the ``GITHUB_TOKEN`` environment variable.
148183
output : str | None
149-
The path where the output file will be written (default: ``digest.txt`` in current directory).
184+
The path where the output file is written (default: ``digest.txt`` in current directory).
150185
Use ``"-"`` to write to ``stdout``.
151186
152187
Raises
@@ -170,9 +205,13 @@ async def _async_main(
170205
summary, _, _ = await ingest_async(
171206
source,
172207
max_file_size=max_size,
173-
include_patterns=include_patterns,
208+
max_files=max_files,
209+
max_total_size_bytes=max_total_size,
210+
max_directory_depth=max_directory_depth,
174211
exclude_patterns=exclude_patterns,
212+
include_patterns=include_patterns,
175213
branch=branch,
214+
tag=tag,
176215
include_gitignored=include_gitignored,
177216
include_submodules=include_submodules,
178217
token=token,

‎src/gitingest/config.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,14 @@
33
import tempfile
44
from pathlib import Path
55

6-
MAX_FILE_SIZE = 10 * 1024 * 1024 # Maximum size of a single file to process (10 MB)
7-
MAX_DIRECTORY_DEPTH = 20 # Maximum depth of directory traversal
8-
MAX_FILES = 10_000 # Maximum number of files to process
9-
MAX_TOTAL_SIZE_BYTES = 500 * 1024 * 1024 # Maximum size of output file (500 MB)
10-
DEFAULT_TIMEOUT = 60 # seconds
6+
from gitingest.utils.config_utils import _get_int_env_var, _get_str_env_var
117

12-
OUTPUT_FILE_NAME = "digest.txt"
8+
MAX_FILE_SIZE = _get_int_env_var("MAX_FILE_SIZE", 10 * 1024 * 1024) # Max file size to process in bytes (10 MB)
9+
MAX_FILES = _get_int_env_var("MAX_FILES", 10_000) # Max number of files to process
10+
MAX_TOTAL_SIZE_BYTES = _get_int_env_var("MAX_TOTAL_SIZE_BYTES", 500 * 1024 * 1024) # Max output file size (500 MB)
11+
MAX_DIRECTORY_DEPTH = _get_int_env_var("MAX_DIRECTORY_DEPTH", 20) # Max depth of directory traversal
1312

14-
TMP_BASE_PATH = Path(tempfile.gettempdir()) / "gitingest"
13+
DEFAULT_TIMEOUT = _get_int_env_var("DEFAULT_TIMEOUT", 60) # Default timeout for git operations in seconds
14+
15+
OUTPUT_FILE_NAME = _get_str_env_var("OUTPUT_FILE_NAME", "digest.txt")
16+
TMP_BASE_PATH = Path(_get_str_env_var("TMP_BASE_PATH", tempfile.gettempdir())) / "gitingest"

‎src/gitingest/entrypoint.py

Lines changed: 47 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,11 @@ async def ingest_async(
3333
source: str,
3434
*,
3535
max_file_size: int = MAX_FILE_SIZE,
36-
include_patterns: str | set[str] | None = None,
36+
max_files: int | None = None,
37+
max_total_size_bytes: int | None = None,
38+
max_directory_depth: int | None = None,
3739
exclude_patterns: str | set[str] | None = None,
40+
include_patterns: str | set[str] | None = None,
3841
branch: str | None = None,
3942
tag: str | None = None,
4043
include_gitignored: bool = False,
@@ -51,17 +54,23 @@ async def ingest_async(
5154
Parameters
5255
----------
5356
source : str
54-
The source to analyze, which can be a URL (for a Git repository) or a local directory path.
57+
A directory path or a Git repository URL.
5558
max_file_size : int
56-
Maximum allowed file size for file ingestion. Files larger than this size are ignored (default: 10 MB).
57-
include_patterns : str | set[str] | None
58-
Pattern or set of patterns specifying which files to include. If ``None``, all files are included.
59+
Maximum file size in bytes to ingest (default: 10 MB).
60+
max_files : int | None
61+
Maximum number of files to ingest (default: 10,000).
62+
max_total_size_bytes : int | None
63+
Maximum total size of output file in bytes (default: 500 MB).
64+
max_directory_depth : int | None
65+
Maximum depth of directory traversal (default: 20).
5966
exclude_patterns : str | set[str] | None
60-
Pattern or set of patterns specifying which files to exclude. If ``None``, no files are excluded.
67+
Glob patterns for pruning the file set.
68+
include_patterns : str | set[str] | None
69+
Glob patterns for including files in the output.
6170
branch : str | None
62-
The branch to clone and ingest (default: the default branch).
71+
Git branch to clone and ingest (default: the default branch).
6372
tag : str | None
64-
The tag to clone and ingest. If ``None``, no tag is used.
73+
Git tag to to clone and ingest. If ``None``, no tag is used.
6574
include_gitignored : bool
6675
If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
6776
include_submodules : bool
@@ -70,7 +79,7 @@ async def ingest_async(
7079
GitHub personal access token (PAT) for accessing private repositories.
7180
Can also be set via the ``GITHUB_TOKEN`` environment variable.
7281
output : str | None
73-
File path where the summary and content should be written.
82+
File path where the summary and content is written.
7483
If ``"-"`` (dash), the results are written to ``stdout``.
7584
If ``None``, the results are not written to a file.
7685
@@ -107,6 +116,13 @@ async def ingest_async(
107116
if query.url:
108117
_override_branch_and_tag(query, branch=branch, tag=tag)
109118

119+
if max_files is not None:
120+
query.max_files = max_files
121+
if max_total_size_bytes is not None:
122+
query.max_total_size_bytes = max_total_size_bytes
123+
if max_directory_depth is not None:
124+
query.max_directory_depth = max_directory_depth
125+
110126
query.include_submodules = include_submodules
111127

112128
async with _clone_repo_if_remote(query, token=token):
@@ -121,8 +137,11 @@ def ingest(
121137
source: str,
122138
*,
123139
max_file_size: int = MAX_FILE_SIZE,
124-
include_patterns: str | set[str] | None = None,
140+
max_files: int | None = None,
141+
max_total_size_bytes: int | None = None,
142+
max_directory_depth: int | None = None,
125143
exclude_patterns: str | set[str] | None = None,
144+
include_patterns: str | set[str] | None = None,
126145
branch: str | None = None,
127146
tag: str | None = None,
128147
include_gitignored: bool = False,
@@ -139,17 +158,23 @@ def ingest(
139158
Parameters
140159
----------
141160
source : str
142-
The source to analyze, which can be a URL (for a Git repository) or a local directory path.
161+
A directory path or a Git repository URL.
143162
max_file_size : int
144-
Maximum allowed file size for file ingestion. Files larger than this size are ignored (default: 10 MB).
145-
include_patterns : str | set[str] | None
146-
Pattern or set of patterns specifying which files to include. If ``None``, all files are included.
163+
Maximum file size in bytes to ingest (default: 10 MB).
164+
max_files : int | None
165+
Maximum number of files to ingest (default: 10,000).
166+
max_total_size_bytes : int | None
167+
Maximum total size of output file in bytes (default: 500 MB).
168+
max_directory_depth : int | None
169+
Maximum depth of directory traversal (default: 20).
147170
exclude_patterns : str | set[str] | None
148-
Pattern or set of patterns specifying which files to exclude. If ``None``, no files are excluded.
171+
Glob patterns for pruning the file set.
172+
include_patterns : str | set[str] | None
173+
Glob patterns for including files in the output.
149174
branch : str | None
150-
The branch to clone and ingest (default: the default branch).
175+
Git branch to clone and ingest (default: the default branch).
151176
tag : str | None
152-
The tag to clone and ingest. If ``None``, no tag is used.
177+
Git tag to to clone and ingest. If ``None``, no tag is used.
153178
include_gitignored : bool
154179
If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
155180
include_submodules : bool
@@ -158,7 +183,7 @@ def ingest(
158183
GitHub personal access token (PAT) for accessing private repositories.
159184
Can also be set via the ``GITHUB_TOKEN`` environment variable.
160185
output : str | None
161-
File path where the summary and content should be written.
186+
File path where the summary and content is written.
162187
If ``"-"`` (dash), the results are written to ``stdout``.
163188
If ``None``, the results are not written to a file.
164189
@@ -179,8 +204,11 @@ def ingest(
179204
ingest_async(
180205
source=source,
181206
max_file_size=max_file_size,
182-
include_patterns=include_patterns,
207+
max_files=max_files,
208+
max_total_size_bytes=max_total_size_bytes,
209+
max_directory_depth=max_directory_depth,
183210
exclude_patterns=exclude_patterns,
211+
include_patterns=include_patterns,
184212
branch=branch,
185213
tag=tag,
186214
include_gitignored=include_gitignored,

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /