-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: extract visitor duplication in column.mojo (~1,500 lines of repeated code) #680
Open
Description
Summary
bison/column.mojo contains ~60 visitor structs with massive code duplication across the five ColumnData type arms (Int64, Float64, Bool, String, PythonObject). Estimated 1,200–1,500 lines of near-identical code (~19% of the file).
Worst offenders
| Visitor | Arms | Duplicated lines | Lines |
|---|---|---|---|
_FfillVisitor |
5 | ~100 | 645–744 |
_BfillVisitor |
5 | ~130 | 765–894 |
_WhereMaskVisitor |
4 | ~148 | 1668–1810 |
_UniqueVisitor |
4 | ~80 | 2099–2175 |
_CumSum/Prod/Min/MaxVisitor |
3 each | ~240 total | 3660–4050 |
_FillnaVisitor |
5 | ~75 | 551–624 |
_FillScalarVisitor |
4 | ~30 | 498–524 |
Each visitor repeats the same loop-with-null-check logic, differing only in the typed data accessor and default/sentinel value.
Proposed fix
Once the Mojo compiler fixes the typed-downcast deadlock (#642), collapse each visitor family into a single generic helper parameterized on dtype. Until then, consider:
- Extract the loop body into a shared helper that accepts typed data by reference — this is possible today for visitors where the loop structure is truly identical.
- Document the duplication with a comment linking back to this issue so future contributors don't add more arms without recognizing the pattern.
Impact
High — this is the largest source of duplicated code in the project. Every new dtype or operation added requires copying the same pattern into 4–5 more arms. Maintenance burden compounds over time.
Classification
- Code smell: Duplicate Code (Dispensables)
- Refactoring: Extract Method / Replace Conditional with Polymorphism
Related
- Mojo compiler deadlock: rebind typed downcasts on ArcPointer + Variant dispatch in same call graph #642 (compiler typed-downcast deadlock — root cause of the dual-storage architecture)
- tech debt: DataFrame element-wise math methods share identical loop boilerplate #612 (DataFrame element-wise math duplication — same family of issues)