-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Bug Fixes: Error Handling and Stability Improvements #108 #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fixes: Error Handling and Stability Improvements #108 #117
Conversation
...handling, added new custom exceptions for validation and processing errors, improved session cleanup logic, and refined repository name extraction. Updated logging format for better traceability.
...G application: Introduced dedicated functions for initializing session state and handling repository loading with improved error handling. Enhanced chat message processing and updated logging for better traceability.
WalkthroughThe code refactors the Streamlit application in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant StreamlitApp
participant RepoProcessor
participant QueryEngine
User->>StreamlitApp: Enter GitHub URL & click Load
StreamlitApp->>RepoProcessor: validate_github_url(url)
RepoProcessor-->>StreamlitApp: ValidationError? (if invalid)
StreamlitApp->>RepoProcessor: process_with_gitingets(url) (with retry)
RepoProcessor-->>StreamlitApp: ProcessingError? (on failure)
RepoProcessor->>QueryEngine: create_query_engine(content_path, repo_name)
QueryEngine-->>StreamlitApp: QueryEngineError? (on failure)
StreamlitApp->>SessionState: Update with repo & engine
User->>StreamlitApp: Enter chat prompt
StreamlitApp->>QueryEngine: Query with prompt
QueryEngine-->>StreamlitApp: Return response
StreamlitApp->>SessionState: Append chat history
User->>StreamlitApp: Click Reset
StreamlitApp->>SessionState: cleanup_session()
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (3)
github-rag/app.py (3)
52-69: Propagate the root cause when exhausting retries
raise last_exceptiondrops the causal chain. Wrap withfrom last_exception(Ruff B904) to preserve traceback context:- raise last_exception + raise last_exception from NoneThis small change greatly improves debuggability.
118-129: Re-raise with cause insideprocess_with_gitingetsSame Ruff B904 concern – chain the original exception so callers can differentiate between ingest-failures and decorator retries.
- except Exception as e: - logger.error(f"Error processing repository: {str(e)}") - raise ProcessingError(f"Failed to process repository: {str(e)}") + except Exception as e: + logger.error("Error processing repository: %s", e) + raise ProcessingError("Failed to process repository") from e🧰 Tools
🪛 Ruff (0.8.2)
128-128: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
170-186: Session init: preserve cause on failureMinor, but consistent with earlier comments:
- except Exception as e: - logger.error(f"Error initializing session state: {str(e)}") - raise SessionError("Failed to initialize session state") + except Exception as e: + logger.error("Error initializing session state: %s", e) + raise SessionError("Failed to initialize session state") from e🧰 Tools
🪛 Ruff (0.8.2)
186-186: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
github-rag/app.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
github-rag/app.py (1)
github-rag/app_local.py (2)
reset_chat(31-34)process_with_gitingets(36-39)
🪛 Ruff (0.8.2)
github-rag/app.py
89-89: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
105-105: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
116-116: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
128-128: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
168-168: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
186-186: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (2)
github-rag/app.py (2)
130-168: Streaming responses may not stringify cleanly
streaming=Truereturns a generator /StreamingResponseobject.
Downstream (handle_chat_message) concatenates it directly into a string which can yield"<generator object ...>".Consider:
- query_engine = index.as_query_engine(streaming=True) + query_engine = index.as_query_engine(streaming=False)or consume the stream before storing.
🧰 Tools
🪛 Ruff (0.8.2)
168-168: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
252-257: Ensure response is serialisableIf
query_engine.query()returns an object, wrap withstr()to avoid "Object of type ... is not JSON serialisable" errors when Streamlit serialises session state.- full_response = f"Repository: {st.session_state.current_repo}\n\n{response}" + full_response = f"Repository: {st.session_state.current_repo}\n\n{str(response)}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset_chat should also reset the loaded engine
After a reset the previous query_engine lingers in memory, allowing questions against a supposedly "cleared" session.
- st.session_state.context = None + st.session_state.context = None + st.session_state.query_engine = None
📝 Committable suggestion
!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🧰 Tools
🪛 Ruff (0.8.2)
116-116: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle trailing slashes and preserve exception context in get_repo_name
- URLs ending with a trailing slash (
.../repo/) trigger a false "invalid format" error becauseparts[-1]is empty. except Exception as e:re-raises withoutfrom e, losing the stack.
- repo_name = parts[-1].replace('.git', '') + repo_segment = parts[-1] or parts[-2] # supports trailing "/" + repo_name = repo_segment.replace('.git', '') @@ - except Exception as e: - raise ValidationError(f"Failed to extract repository name: {str(e)}") + except Exception as e: + raise ValidationError( + f"Failed to extract repository name: {e}" + ) from e
📝 Committable suggestion
!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🧰 Tools
🪛 Ruff (0.8.2)
89-89: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid duplicating log handlers on every Streamlit rerun
logging.basicConfig() gets executed on every Streamlit script rerun, which may attach multiple handlers and cause duplicated log lines.
Guard the configuration or add handlers only once:
-logging.basicConfig( - level=logging.INFO, - format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' -) -if not logger.handlers: - logger = logging.getLogger(__name__) +logger = logging.getLogger(__name__) +if not logger.handlers: + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + )
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path-traversal & binary-file issues when writing repo content
- A malicious repo file path such as
../../etc/passwdescapestemp_dir. - Opening files with
'w'fails on binary blobs.
- for file_path, file_content in content.items(): - file_dir = os.path.dirname(os.path.join(content_path, file_path)) + for file_path, file_content in content.items(): + safe_path = os.path.normpath(os.path.join(content_path, file_path)) + if not safe_path.startswith(content_path): + logger.warning("Skipping suspicious path: %s", file_path) + continue + file_dir = os.path.dirname(safe_path) @@ - with open(os.path.join(content_path, file_path), 'w', encoding='utf-8') as f: - f.write(file_content) + mode, kwargs = ('wb', {}) if isinstance(file_content, (bytes, bytearray)) else ('w', {'encoding': 'utf-8'}) + with open(safe_path, mode, **kwargs) as f: + f.write(file_content)
This mitigates a critical security flaw and handles binary assets gracefully.
📝 Committable suggestion
!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
cleanup_session does not actually remove cached objects
del value only deletes the local reference, not the entry stored in st.session_state.file_cache.
Also adopt from e per Ruff B904.
- if hasattr(st.session_state, 'file_cache'): - for key, value in st.session_state.file_cache.items(): - try: - del value - except Exception as e: - logger.warning(f"Failed to cleanup cache entry {key}: {str(e)}") - st.session_state.file_cache.clear() + cache = getattr(st.session_state, "file_cache", {}) + for key in list(cache.keys()): + try: + del cache[key] + except Exception as e: + logger.warning("Failed to cleanup cache entry %s: %s", key, e)
📝 Committable suggestion
!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🧰 Tools
🪛 Ruff (0.8.2)
105-105: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
Uh oh!
There was an error while loading. Please reload this page.
This PR addresses issue #108 by improving the application's error handling and session state management with the following changes:
Error Handling Improvements
Session State Management
Code Organization
These changes make the application more robust, easier to maintain, and provide better feedback to users when errors occur.
Closes #108
Summary by CodeRabbit
New Features
Refactor
Bug Fixes