Parser for Python's difflib output.
Built on top of https://github.com/yebrahim/difflibparser/blob/master/difflibparser.py
Key changes from above library:
- Using generator pattern instead of using iterator pattern when iterating over diffs
- Using
@dataclassover generic dictionaries to enforce strict typing - Using type annotations for strict typing
pip install difflib-parser
from difflib_parser import difflib_parser parser = difflib_parser.DiffParser(["hello world"], ["hello world!"]) for diff in parser.iter_diffs(): print(diff)
class DiffCode(Enum): SAME = 0 RIGHT_ONLY = 1 LEFT_ONLY = 2 CHANGED = 3 @dataclass class Diff: code: DiffCode line: str left_changes: List[int] | None = None right_changes: List[int] | None = None newline: str | None = None
A difflib output might look something like this:
>>> import difflib >>> print("\n".join(list(difflib.ndiff(["hello world"], ["hola world"])))) - hello world ? ^ ^^ + hola world ? ^ ^
The specifics of diff interpretation can be found in the documentation.
There are concretely four types of changes we are interested in:
- No change
- A new line is added
- An existing line is removed
- An existing line is edited
Given that the last two cases operate on existing lines, they will always be preceded by - . As such, we need to handle them delicately.
If an existing line is removed, it will not have any follow-up lines.
If an existing line is edited, it will have several follow-up lines that provide details on the values that have been changed.
From these follow-up lines, we can further case the changes made to a line:
- Only additions made (i.e.
"Hello world"->"Hello world!") - Only removals made (i.e.
"Hello world"->"Hllo world") - Both additions and removals made (i.e.
"Hello world"->"Hola world!")
Each of them have their unique follow-up lines:
-,+,?
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hello world!"])))) - hello world + hello world! ? +
-,?,+
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hllo world"])))) - hello world ? - + hllo world
-,?,+,?
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["helo world!"])))) - hello world ? - + helo world! ? +
As such, we have included them as separate patterns to process.