coregex is currently in experimental release (v0.x.x). We provide security updates for the following versions:
| Version | Supported |
|---|---|
| 0.1.x | ✅ |
| < 0.1.0 | ❌ |
Future stable releases (v1.0+) will follow semantic versioning with LTS support.
We take security seriously. If you discover a security vulnerability in coregex, please report it responsibly.
DO NOT open a public GitHub issue for security vulnerabilities.
Instead, please report security issues by:
-
Private Security Advisory (preferred): https://github.com/coregx/coregex/security/advisories/new
-
Email to maintainers: Create a private GitHub issue or contact via discussions
Please include the following information in your report:
- Description of the vulnerability
- Steps to reproduce the issue (include malicious regex pattern if applicable)
- Affected versions (which versions are impacted)
- Potential impact (DoS, memory exhaustion, unexpected behavior, etc.)
- Suggested fix (if you have one)
- Your contact information (for follow-up questions)
- Initial Response: Within 48-72 hours
- Triage & Assessment: Within 1 week
- Fix & Disclosure: Coordinated with reporter
We aim to:
- Acknowledge receipt within 72 hours
- Provide an initial assessment within 1 week
- Work with you on a coordinated disclosure timeline
- Credit you in the security advisory (unless you prefer to remain anonymous)
coregex is a regex engine that compiles and executes untrusted regex patterns. This introduces security risks that users should be aware of.
Risk: Crafted regex patterns can cause excessive CPU usage or memory exhaustion.
Attack Vectors:
- Catastrophic backtracking: Patterns with nested quantifiers (e.g.,
(a+)+b) - DFA state explosion: Patterns causing exponential DFA states
- Memory exhaustion: Patterns with large repetition counts
- Pattern injection: User-supplied regex patterns in web applications
Mitigation in Library:
- ✅ No backtracking in DFA - DFA search is O(n) time, immune to catastrophic backtracking
- ✅ Lazy DFA with limits - DFA state cache has configurable max size (default: 10,000 states)
- ✅ NFA fallback - Graceful degradation when DFA cache fills
- ✅ Thompson's NFA - PikeVM execution is O(n×ばつm), bounded worst-case time
- ✅ Determinization limit - Prevents excessive NFA→DFA conversion (default: 1,000 states)
- 🔄 Pattern complexity analysis - Planned for v0.2.0
User Recommendations:
// ❌ BAD - Don't compile untrusted patterns without limits pattern := userInput // Could be "(a+)+b" re, _ := coregex.Compile(pattern) re.Match(largeInput) // Potential DoS // ✅ GOOD - Use custom config with strict limits config := coregex.DefaultConfig() config.DFAMaxStates = 1000 // Limit DFA cache config.DeterminizationLimit = 100 // Limit NFA→DFA complexity re, err := coregex.CompileWithConfig(pattern, config) if err != nil { // Pattern too complex or compilation failed return errors.New("invalid pattern") } // Match with timeout (application-level) ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond) defer cancel() done := make(chan bool) go func() { result := re.Match(input) done <- result }() select { case <-done: // Match completed case <-ctx.Done(): // Timeout - potential DoS pattern return errors.New("match timeout") }
Risk: Large repetition counts or pattern sizes can cause integer overflow.
Example Attack:
Pattern: a{4294967295} // 2^32-1 repetitions
Compilation: May overflow when allocating NFA states
Result: Incorrect buffer allocation or panic
Mitigation:
- ✅ Pattern length validation (via Go's regexp/syntax parser)
- ✅ Repetition count limits enforced by regexp/syntax
- ✅ Safe integer arithmetic in state counting
- ✅ NFA state limit enforcement
Current Limits:
- Max pattern length: Limited by regexp/syntax parser
- Max repetition count: Limited by regexp/syntax parser
- Max NFA states: Limited by determinization limit
- Max DFA states: Configurable (default: 10,000)
Risk: Complex patterns can fill DFA state cache, causing performance degradation.
Attack Vectors:
- Patterns with large character classes and alternations
- Unicode patterns with many possible transitions
- Patterns designed to maximize DFA states
Mitigation:
- ✅ Configurable DFA cache size (MaxStates)
- ✅ Automatic NFA fallback when cache full
- ✅ Thread-safe cache with hit/miss statistics
- ✅ Cache clear method for manual reset
Cache Configuration:
// Default config (production-ready) config := coregex.DefaultConfig() config.DFAMaxStates = 10000 // 10K states (~1-2MB memory) // Restricted config (untrusted patterns) config.DFAMaxStates = 100 // Only 100 states, faster fallback to NFA // Permissive config (trusted patterns, performance-critical) config.DFAMaxStates = 100000 // 100K states (~10-20MB memory)
Risk: Regex compilation or execution can allocate large amounts of memory.
Attack Vectors:
- NFA with thousands of states
- DFA cache growing to max size
- Large input strings with many matches
- FindAll with n=-1 on pathological patterns
Mitigation:
- ✅ Lazy DFA construction (only builds states needed)
- ✅ Configurable limits on cache size
- ✅ Bounded NFA state allocation
- ✅ Streaming input processing (no full input buffering)
User Best Practices:
// ❌ BAD - Unbounded FindAll on untrusted input matches := re.FindAll(hugeInput, -1) // May allocate huge slice // ✅ GOOD - Limit number of matches matches := re.FindAll(input, 100) // Max 100 matches // ✅ GOOD - Validate input size first if len(input) > maxInputSize { return errors.New("input too large") }
Risk: User-supplied regex patterns in web applications can be exploited.
Attack Vectors:
- Search functionality with user-provided patterns
- Filter expressions using regex
- Template systems with regex validation
Mitigation (Application Level):
// ❌ BAD - Direct user input as pattern pattern := r.URL.Query().Get("search") // Untrusted! re, _ := coregex.Compile(pattern) // ✅ GOOD - Whitelist allowed patterns allowedPatterns := map[string]string{ "email": `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`, "phone": `\d{3}-\d{3}-\d{4}`, } patternName := r.URL.Query().Get("type") pattern, ok := allowedPatterns[patternName] if !ok { return errors.New("invalid pattern type") } // ✅ GOOD - Escape user input for literal matching searchTerm := regexp.QuoteMeta(r.URL.Query().Get("search")) pattern := fmt.Sprintf(`\b%s\b`, searchTerm)
Risk: coregex uses hand-written AVX2/SSSE3 assembly for SIMD acceleration.
Attack Vectors:
- Buffer overflows in assembly code
- Unaligned memory access causing crashes
- VZEROUPPER omission causing performance penalties
Mitigation:
- ✅ Extensive bounds checking in assembly
- ✅ Alignment handling (aligned + unaligned paths)
- ✅ VZEROUPPER called before all AVX2 returns
- ✅ Comprehensive tests including alignment edge cases
- ✅ Fuzz testing for assembly code paths
- ✅ Pure Go fallback for non-AMD64 platforms
Current Assembly Functions:
memchrAVX2- Single byte search (AVX2)memchr2AVX2- Two-byte search (AVX2)memchr3AVX2- Three-byte search (AVX2)teddySSSE3- Multi-pattern search (SSSE3)
All have extensive validation and bounds checking.
Always validate regex patterns from untrusted sources:
// Validate pattern complexity before compilation if len(pattern) > maxPatternLength { return errors.New("pattern too long") } // Try to compile with strict limits config := coregex.DefaultConfig() config.DFAMaxStates = 1000 config.DeterminizationLimit = 100 re, err := coregex.CompileWithConfig(pattern, config) if err != nil { // Pattern failed validation - potentially malicious log.Printf("Failed to compile pattern: %v", err) return err }
Set limits when processing untrusted patterns or input:
// Limit input size const maxInputSize = 10 * 1024 * 1024 // 10MB if len(input) > maxInputSize { return errors.New("input too large") } // Limit number of matches const maxMatches = 1000 matches := re.FindAll(input, maxMatches) // Use timeout for execution ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) defer cancel() done := make(chan []byte) go func() { match := re.Find(input) done <- match }() select { case result := <-done: // Success case <-ctx.Done(): // Timeout return errors.New("regex execution timeout") }
Always check errors - compilation failures may indicate malicious patterns:
// ❌ BAD - Ignoring errors re, _ := coregex.Compile(pattern) matches := re.FindAll(input, -1) // ✅ GOOD - Proper error handling re, err := coregex.Compile(pattern) if err != nil { return fmt.Errorf("pattern compilation failed: %w", err) } match := re.Find(input) if match == nil { log.Printf("No match found") return nil } // Process match...
Use pattern whitelists instead of user-provided patterns:
// ✅ Pre-compile trusted patterns var ( emailPattern = coregex.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`) phonePattern = coregex.MustCompile(`\d{3}-\d{3}-\d{4}`) datePattern = coregex.MustCompile(`\d{4}-\d{2}-\d{2}`) ) // Select pattern by type, not user input func validateField(fieldType string, value string) bool { var pattern *coregex.Regex switch fieldType { case "email": pattern = emailPattern case "phone": pattern = phonePattern case "date": pattern = datePattern default: return false } return pattern.Match([]byte(value)) }
Status: Mitigated by O(n×ばつm) worst-case guarantee.
Risk Level: Low
Description: Thompson's NFA construction ensures no backtracking. PikeVM execution is bounded by O(n×ばつm) where n=input length, m=NFA states.
Mitigation:
- ✅ Thompson's construction (no backtracking)
- ✅ SparseSet for O(1) state tracking
- ✅ Determinization limits prevent m from growing unbounded
Status: Mitigated by lazy construction + cache limits.
Risk Level: Medium
Description: Certain patterns can cause exponential DFA states. Lazy DFA only builds states encountered during search.
Mitigation:
- ✅ Lazy construction (on-demand)
- ✅ Configurable max states limit
- ✅ Automatic NFA fallback
- ✅ Cache hit/miss statistics for monitoring
coregex dependencies:
golang.org/x/sys(minimal) - CPU feature detection for SIMD- No other runtime dependencies
Monitoring:
- ✅ Minimal dependency surface (only 1 dependency)
- ✅ Standard library dependency (golang.org/x)
- 🔄 Dependabot enabled (planned when public)
- ✅ Unit tests with edge cases (empty input, alignment, boundaries)
- ✅ Fuzz tests for SIMD primitives
- ✅ Comparison tests vs stdlib regexp (correctness)
- ✅ Benchmarks for performance validation
- ✅ Race detector (0 data races)
- ✅ golangci-lint with 34+ linters
- 🔄 Fuzzing for pattern compilation
- 🔄 ReDoS vulnerability scanning
- 🔄 Static analysis with gosec
- 🔄 SAST/DAST scanning in CI
- 🔄 Comparison fuzzing against multiple regex engines
Initial release - No security issues reported yet.
coregex v0.1.0 is a new project with production-quality code but experimental API stability.
Recommendation: Use with caution in production. API may change in v0.2+.
- GitHub Security Advisory: https://github.com/coregx/coregex/security/advisories/new
- Public Issues (for non-sensitive bugs): https://github.com/coregx/coregex/issues
- Discussions: https://github.com/coregx/coregex/discussions
coregex does not currently have a bug bounty program. We rely on responsible disclosure from the security community.
If you report a valid security vulnerability:
- ✅ Public credit in security advisory (if desired)
- ✅ Acknowledgment in CHANGELOG
- ✅ Our gratitude and recognition in README
- ✅ Priority review and quick fix
Thank you for helping keep coregex secure! 🔒
Security is a journey, not a destination. We continuously improve our security posture with each release.