Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

FEAT: Improved Connection String handling in Python #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
saurabh500 wants to merge 13 commits into main
base: main
Choose a base branch
Loading
from saurabh/connection_string_allow_list

Conversation

@saurabh500
Copy link

@saurabh500 saurabh500 commented Oct 29, 2025
edited
Loading

Work Item / Issue Reference

AB#39896

GitHub Issue: #306


Summary

Fixes #306 by introducing a connection string parser for mssql-python.

PR Title Guide

FEAT: Improve connection string parsing, honor the allow list, and improve error messaging.

Connection String Allowlist Feature - Review Guide

Overview

This PR introduces a comprehensive connection string validation and allowlist system that validates all ODBC connection parameters before passing them to the driver, improving usability and providing better error messages.


What This Changes

Before: Connection strings were passed directly to the ODBC driver with minimal or no validation. Unknown parameters were silently ignored by ODBC, malformed strings caused cryptic ODBC errors.

After: All connection strings are parsed, validated against an allowlist of ODBC Driver 18 parameters, and reconstructed with proper escaping. Clear error messages are provided for any issues.


Data Flow

User Input (connection string + kwargs)
 ↓
┌─────────────────────────────────────────────┐
│ Connection.__init__() │
│ _construct_connection_string() │
└─────────────────────────────────────────────┘
 ↓
┌─────────────────────────────────────────────┐
│ Step 1: Parse Connection String │
│ _ConnectionStringParser.parse() │
│ - Tokenizes key=value pairs │
│ - Handles braced values: {val} │
│ - Processes escape sequences: }}-> } │
│ - Detects syntax errors │
│ Output: Dict[str, str] │
└─────────────────────────────────────────────┘
 ↓
┌─────────────────────────────────────────────┐
│ Step 2: Validate Against Allowlist │
│ ConnectionStringAllowList.normalize_keys() │
│ - Checks unknown parameters │
│ - Normalizes synonyms (host→Server) │
│ - Blocks reserved params (Driver, APP) │
│ - Warns about rejected params │
│ Output: Dict[str, str] (filtered) │
└─────────────────────────────────────────────┘
 ↓
┌─────────────────────────────────────────────┐
│ Step 3: Process kwargs │
│ - Normalize each kwarg key │
│ - Block reserved parameters │
│ - Merge into filtered params │
│ - kwargs override connection string │
└─────────────────────────────────────────────┘
 ↓
┌─────────────────────────────────────────────┐
│ Step 4: Build Connection String │
│ _ConnectionStringBuilder.build() │
│ - Add Driver (hardcoded) │
│ - Add APP=MSSQL-Python │
│ - Escape special characters │
│ - Format: key1=value1;key2=value2; │
│ Output: str (final connection string) │
└─────────────────────────────────────────────┘
 ↓
ODBC Driver (validated, safe connection string)

File Structure & Review Order

Recommended Review Order

Review in this sequence to understand the architecture:

1. Core Components (understand the building blocks)

  1. mssql_python/connection_string_parser.py Start here

    • Purpose: Parses ODBC connection strings per MS-ODBCSTR spec
    • Key Classes:
      • ConnectionStringParseError - Custom exception
      • _ConnectionStringParser - Main parser class
    • Key Methods:
      • parse(connection_str) - Main entry point
      • _parse_value(str, pos) - Handles braced values & escaping
    • What to Look For:
      • Proper handling of escape sequences ({{, }})
      • Error collection and batch reporting
      • Edge cases: empty strings, whitespace, special chars
  2. mssql_python/connection_string_allowlist.py Critical

    • Purpose: Validates parameters against ODBC Driver 18 keywords
    • Key Classes:
      • ConnectionStringAllowList - Singleton allowlist manager
    • Key Methods:
      • normalize_key(key) - Maps synonyms to canonical names
      • filter_params(params) - Validates and filters parameters
    • What to Look For:
      • Completeness of allowlist (compare with ODBC docs)
      • Synonym mappings (host→Server, user→UID, etc.)
      • Reserved parameter handling (Driver, APP)
  3. mssql_python/connection_string_builder.py

    • Purpose: Safely constructs connection strings with escaping
    • Key Classes:
      • _ConnectionStringBuilder - Builder with escape logic
    • Key Methods:
      • add_param(key, value) - Adds a parameter
      • _needs_braces(value) - Determines if braces needed
      • _escape_value(value) - Escapes special characters
      • build() - Constructs final string
    • What to Look For:
      • Correct escaping logic (;, {, }, =)
      • Proper brace placement
      • Semicolon formatting

2. Integration (see how it fits together)

  1. mssql_python/connection.py Integration point

    • Modified Method: _construct_connection_string()
    • What Changed:
      • Lines 241-303: New implementation replacing old concat logic
      • Now uses parser-> filter-> builder pipeline
      • Handles kwargs with allowlist validation
      • Raises ValueError for reserved parameters
    • What to Look For:
      • Error handling for parse failures
      • kwargs override behavior
      • Reserved parameter rejection
      • Logging of warnings
  2. mssql_python/__init__.py

    • What Changed:
      • Added export: ConnectionStringParseError
    • What to Look For:
      • Proper exception exposure for users

3. Tests (validate functionality)

  1. tests/test_010_connection_string_parser.py Parser tests

    • Coverage: ~40 test cases
    • Test Categories:
      • Valid parsing scenarios
      • Braced value handling
      • Escape sequence processing
      • Error detection and reporting
      • Edge cases (empty, whitespace, unicode)
    • What to Look For:
      • Test coverage of MS-ODBCSTR spec
      • Error message clarity
      • Edge case handling
  2. tests/test_011_connection_string_allowlist.py Allowlist tests

    • Coverage: ~25 test cases
    • Test Categories:
      • Key normalization (synonyms)
      • Parameter filtering
      • Reserved parameter blocking
      • Unknown parameter detection
    • What to Look For:
      • All ODBC parameters tested
      • Synonym mappings validated
      • Security (Driver/APP blocking)
  3. tests/test_012_connection_string_integration.py Integration tests

    • Coverage: ~20 test cases
    • Test Categories:
      • End-to-end parsing-> filtering-> building
      • Real connection string scenarios
      • Error propagation from connect() API
      • kwargs override behavior
    • What to Look For:
      • Real-world usage patterns
      • Error messages match user expectations
      • Backward compatibility where possible
  4. tests/test_003_connection.py (Updated)

    • What Changed:
      • Updated assertions in test_construct_connection_string()
      • Updated assertions in test_connection_string_with_attrs_before()
      • Updated assertions in test_connection_string_with_odbc_param()
    • What to Look For:
      • Assertions match new builder output format
      • No semicolons in middle of assertions (builder handles formatting)
  5. tests/test_006_exceptions.py (Updated)

    • What Changed:
      • test_connection_error() now expects ConnectionStringParseError
      • Updated error message assertions
    • What to Look For:
      • Proper exception type changes
      • Error messages are helpful

4. Documentation (understand design decisions)

  1. docs/connection_string_allow_list_design.md Read this

    • Content:
      • Design rationale and motivation
      • Architecture decisions
      • Security considerations
      • Future enhancements
    • What to Look For:
      • Justification for approach
      • Trade-offs discussed
      • Security implications understood
  2. docs/parser_state_machine.md

    • Content:
      • Detailed state machine for parser
      • Character-by-character processing flow
      • Error handling states
      • Escape sequence examples
    • What to Look For:
      • Parser logic is well-documented
      • Edge cases covered in examples

Areas to Focus On

1. Security

  • Reserved Parameters: Verify Driver and APP cannot be set by users
  • Allowlist Completeness: Check all ODBC Driver 18 params are included
  • Escape Handling: Ensure no injection via special characters

2. Error Handling

  • Error Messages: Are they clear and actionable?
  • Error Collection: Multiple errors reported together?
  • Backward Compatibility: Do meaningful errors replace cryptic ODBC errors?

3. Performance

  • Parsing Overhead: There is no string split being used during parsing. Hence the cost of multiple allocation of strings is avoided.
  • No Regex in Hot Path: Parser uses character-by-character processing. Regex have been known to cause problems and its advisable to stay away from them.
  • Allowlist Lookup: O(1) dict lookups, minimal overhead

4. Correctness

  • Synonym Handling: All common aliases map correctly
  • Case Insensitivity: Keys normalized consistently

Testing Strategy

Test Coverage Map

Component Test File Key Scenarios
Parser test_010_connection_string_parser.py Syntax, escaping, errors
Allowlist test_011_connection_string_allowlist.py Validation, normalization
Builder test_012_connection_string_integration.py Escaping, formatting
Integration test_012_connection_string_integration.py End-to-end flows
Connection test_003_connection.py Connection string construction
Exceptions test_006_exceptions.py Error propagation

Behavior Changes

Should be aware of these behavioral changes:

1. Unknown Parameters Now Raise Errors

Before:

connect("Server=localhost;FakeParam=value") # Silently ignored

After:

connect("Server=localhost;FakeParam=value")
# Raises: ConnectionStringParseError: Unknown keyword 'FakeParam' is not recognized

2. Malformed Strings Caught Early

Before:

connect("ServerLocalhost") # ODBC error later

After:

connect("ServerLocalhost")
# Raises: ConnectionStringParseError: Incomplete specification: keyword 'ServerLocalhost' has no value

3. Reserved Parameters Blocked

Before:

connect("Server=localhost;Driver={SQL Server}") # Maybe ignored

After:

connect("Server=localhost;Driver={SQL Server}")
# Raises: ValueError: Connection parameter 'Driver' is reserved and controlled by the driver

Key Concepts to Understand

ODBC Connection String Format

key1=value1;key2=value2;key3={val;ue}

Braced Values

Used when value contains semicolons or special characters:

PWD={my;pass;word} -> Password is "my;pass;word"

Escape Sequences

  • {{-> { (escaped left brace)
  • }}-> } (escaped right brace)

Example:

PWD={a}}b{{c} -> Password is "a}b{c"

Copy link

github-actions bot commented Oct 29, 2025
edited
Loading

📊 Code Coverage Report

🔥 Diff Coverage

97%


🎯 Overall Coverage

77%


📈 Total Lines Covered: 4744 out of 6145
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/init.py (100%)
  • mssql_python/connection.py (100%)
  • mssql_python/connection_string_allowlist.py (90.9%): Missing lines 137-139
  • mssql_python/connection_string_builder.py (100%)
  • mssql_python/connection_string_parser.py (98.2%): Missing lines 124,215
  • mssql_python/exceptions.py (100%)

Summary

  • Total: 194 lines
  • Missing: 5 lines
  • Coverage: 97%

mssql_python/connection_string_allowlist.py

Lines 133-143

 133 try:
 134 from mssql_python.logging_config import get_logger
 135 from mssql_python.helpers import sanitize_user_input
 136 logger = get_logger()
! 137 except ImportError:
! 138 logger = None
! 139 sanitize_user_input = lambda x: str(x)[:50] # Simple fallback
 140 
 141 filtered = {}
 142 
 143 # The rejected list should ideally be empty when used in the normal connection

mssql_python/connection_string_parser.py

Lines 120-128

 120 if incomplete_text:
 121 errors.append(f"Incomplete specification: keyword '{incomplete_text}' has no value (missing '=')")
 122 # Skip to next semicolon
 123 while current_pos < str_len and connection_str[current_pos] != ';':
! 124 current_pos += 1
 125 continue
 126 
 127 # Extract and normalize the key
 128 key = connection_str[key_start:current_pos].strip().lower()

Lines 211-219

 211 str_len = len(connection_str)
 212 
 213 # Skip leading whitespace before the value
 214 while start_pos < str_len and connection_str[start_pos] in ' \t':
! 215 start_pos += 1
 216 
 217 # If we've consumed the entire string or reached a semicolon, return empty value
 218 if start_pos >= str_len:
 219 return '', start_pos


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.helpers.py: 67.2%
mssql_python.pybind.ddbc_bindings.cpp: 70.7%
mssql_python.pybind.connection.connection.cpp: 74.4%
mssql_python.row.py: 77.9%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.cursor.py: 83.6%
mssql_python.pybind.connection.connection_pool.cpp: 84.8%
mssql_python.auth.py: 87.1%
mssql_python.pooling.py: 87.7%
mssql_python.connection_string_allowlist.py: 91.1%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

@saurabh500 saurabh500 changed the title (削除) Test PR (削除ここまで) (追記) FEAT: Improved Connection String handling in Python (追記ここまで) Oct 29, 2025
@github-actions github-actions bot added the pr-size: large Substantial code update label Oct 29, 2025
Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devskim found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Saurabh Singh (SQL Drivers) added 4 commits October 29, 2025 14:16
... to _ConnectionStringAllowList
- Updated references in connection.py, connection_string_allowlist.py, connection_string_parser.py, and related test files to use the new class name.
- Changed method names from `parse` to `_parse` in _ConnectionStringParser for internal consistency.
- Adjusted unit tests to reflect the new class and method names, ensuring all tests pass with the updated structure.
- Improved code clarity and maintainability by adhering to naming conventions for internal classes.
...nd improve error handling for empty values
...ate test cases for parameter normalization
@saurabh500 saurabh500 marked this pull request as ready for review October 30, 2025 23:16
Copilot AI review requested due to automatic review settings October 30, 2025 23:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive connection string allow-list feature for mssql-python, adding strict validation and parsing for ODBC connection strings. The implementation includes a three-stage pipeline (parse → filter → build) that validates connection string parameters against an allow-list, normalizes synonyms, and properly handles ODBC escape sequences.

Key Changes:

  • Implemented ODBC-compliant connection string parser with strict validation
  • Added connection string parameter allow-list with synonym normalization
  • Created connection string builder with proper escaping
  • Updated connection construction to use the new validation pipeline

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
mssql_python/exceptions.py Added ConnectionStringParseError exception for validation failures
mssql_python/connection_string_parser.py New ODBC-compliant parser with strict validation and error batching
mssql_python/connection_string_allowlist.py New allow-list manager for parameter validation and normalization
mssql_python/connection_string_builder.py New builder for reconstructing connection strings with proper escaping
mssql_python/connection.py Updated _construct_connection_string to use new validation pipeline
tests/test_010_connection_string_parser.py Comprehensive parser unit tests (448 lines)
tests/test_011_connection_string_allowlist.py Allow-list unit tests (245 lines)
tests/test_012_connection_string_integration.py Integration tests for complete flow (664 lines)
tests/test_003_connection.py Updated tests for new parameter name normalization
tests/test_006_exceptions.py Updated to expect ConnectionStringParseError for invalid strings
eng/pipelines/*.yml Removed Driver parameter from test connection strings
Comments suppressed due to low confidence (1)

mssql_python/connection_string_allowlist.py:1

  • [nitpick] The comment formatting is inconsistent. The comment on line 82 should be on a separate line above the code, matching the style used elsewhere in the file (e.g., lines 35-40).
"""

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert app_name_received == 'MSSQL-Python', \
f"Expected SQL Server to receive 'MSSQL-Python', but got '{app_name_received}'"

print(f"\n SQL Server correctly received APP_NAME: '{app_name_received}'")
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test code should not use print statements for output. Use logging or pytest's built-in capture mechanisms instead. Print statements can clutter test output and don't integrate well with test runners.

Suggested change
print(f"\n SQL Server correctly received APP_NAME: '{app_name_received}'")
# SQL Server correctly received APP_NAME: '{app_name_received}'

Copilot uses AI. Check for mistakes.
assert "'app'" in error_lower
assert "controlled by the driver" in error_lower

print("\n APP in connection string correctly raised ConnectionStringParseError")
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test code should not use print statements for output. Use logging or pytest's built-in capture mechanisms instead.

Copilot uses AI. Check for mistakes.
assert "reserved and controlled by the driver" in str(exc_info.value)
assert "APP" in str(exc_info.value) or "app" in str(exc_info.value).lower()

print("\n APP in kwargs correctly raised ValueError before connecting to SQL Server")
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test code should not use print statements for output. Use logging or pytest's built-in capture mechanisms instead.

Copilot uses AI. Check for mistakes.
original_params = parser._parse(conn_str)

# Get the server from original connection for reconnection
server = original_params.get('server', 'localhost')
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable server is not used.

Suggested change
server = original_params.get('server', 'localhost')

Copilot uses AI. Check for mistakes.
- Collects all errors and reports them together
"""

from typing import Dict, Tuple, List
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'List' is not used.

Suggested change
from typing import Dict, Tuple, List
from typing import Dict, Tuple

Copilot uses AI. Check for mistakes.
"""

from typing import Dict, Tuple, List
import logging
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'logging' is not used.

Suggested change
import logging

Copilot uses AI. Check for mistakes.
Unit tests for _ConnectionStringAllowList.
"""

import pytest
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pytest' is not used.

Suggested change
import pytest

Copilot uses AI. Check for mistakes.
from mssql_python.connection_string_allowlist import _ConnectionStringAllowList
from mssql_python.connection_string_builder import _ConnectionStringBuilder
from mssql_python import connect
from mssql_python.connection import Connection
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Connection' is not used.

Suggested change
from mssql_python.connection import Connection

Copilot uses AI. Check for mistakes.
from mssql_python.connection_string_builder import _ConnectionStringBuilder
from mssql_python import connect
from mssql_python.connection import Connection
from mssql_python.exceptions import DatabaseError, InterfaceError
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'InterfaceError' is not used.
Import of 'DatabaseError' is not used.

Suggested change
from mssql_python.exceptions import DatabaseError, InterfaceError

Copilot uses AI. Check for mistakes.

print(f"\n SQL Server correctly received APP_NAME: '{app_name_received}'")
finally:
conn.close()
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instance of context-manager class Connection is closed in a finally block. Consider using 'with' statement.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Copilot code review Copilot Copilot left review comments

At least 2 approving reviews are required to merge this pull request.

Labels

pr-size: large Substantial code update

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

More usable handling of connection string for mssql-python

AltStyle によって変換されたページ (->オリジナル) /