-
-
Notifications
You must be signed in to change notification settings - Fork 49.4k
Open
@vivekkumarrathour
Description
Feature description
Feature Description
The repository has several string matching algorithms (KMP, Boyer-Moore, Naive Search) but is missing the Rabin-Karp algorithm, which is an important string matching algorithm that uses hashing for pattern searching. This is particularly useful for multiple pattern searches and plagiarism detection.
Proposed Implementation
I propose adding a Rabin-Karp string matching algorithm in the strings/ directory with the following features:
- Single pattern matching using rolling hash technique
- Multiple pattern matching - search for multiple patterns in a single pass
- Configurable hash parameters (base and modulus)
- Collision handling - verify matches to avoid false positives
- Time complexity: O(n + m) average case, O(nm) worst case
- Comprehensive doctests with edge cases (empty strings, no matches, multiple matches)
- Type hints and clear documentation
Why This Enhancement?
- Educational Value: Rabin-Karp introduces the concept of rolling hash, an important technique in computer science
- Practical Applications:
- Plagiarism detection (comparing documents)
- DNA sequence matching in bioinformatics
- Substring search in large texts
- Finding duplicate files
- Complements existing algorithms: Provides comparison with KMP and Boyer-Moore
- Multiple pattern searching: Unlike other algorithms, Rabin-Karp can efficiently search for multiple patterns simultaneously
Key Features to Highlight
- Rolling hash technique: Demonstrates efficient hash recalculation
- Spurious hit handling: Shows how to deal with hash collisions
- Modular design: Separate functions for single and multiple pattern matching
- Performance comparison: Can be compared with naive, KMP, and Boyer-Moore approaches
Example Applications
# Single pattern search text = "hello world hello" pattern = "hello" indices = rabin_karp_search(text, pattern) # Returns [0, 12] # Multiple pattern search patterns = ["hello", "world"] matches = rabin_karp_multiple(text, patterns) # Returns {"hello": [0, 12], "world": [6]}
Benefits
- Teaches rolling hash concept
- Useful for real-world string matching problems
- Shows trade-offs between different string matching algorithms
- Demonstrates practical applications of hashing
I'm willing to implement this following the repository's guidelines if approved.