forked from google/diff-match-patch
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit db1cbba
Python2: Stop breaking surrogate pairs in toDelta()
Resolves google#69 for Python2
Sometimes we can find a common prefix that runs into the middle of a
surrogate pair and we split that pair when building our diff groups.
This is fine as long as we are operating on UTF-16 code units. It
becomes problematic when we start trying to treat those substrings as
valid Unicode (or UTF-8) sequences.
When we pass these split groups into `toDelta()` we do just that and the
library crashes. In this patch we're post-processing the diff groups
before encoding them to make sure that we un-split the surrogate pairs.
The post-processed diffs should produce the same output when applying
the diffs. The diff string itself will be different but should change
that much - only by a single character at surrogate boundaries.1 parent dfadc9c commit db1cbba
2 files changed
+110
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | + | ||
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
1135 | 1136 | | |
1136 | 1137 | | |
1137 | 1138 | | |
1139 | + | ||
1140 | + | ||
1141 | + | ||
1142 | + | ||
1143 | + | ||
1144 | + | ||
1145 | + | ||
1146 | + | ||
1138 | 1147 | | |
1139 | 1148 | | |
1140 | 1149 | | |
| |||
1148 | 1157 | | |
1149 | 1158 | | |
1150 | 1159 | | |
1160 | + | ||
1151 | 1161 | | |
1162 | + | ||
1163 | + | ||
1164 | + | ||
1165 | + | ||
1166 | + | ||
1167 | + | ||
1168 | + | ||
1169 | + | ||
1170 | + | ||
1171 | + | ||
1172 | + | ||
1173 | + | ||
1174 | + | ||
1175 | + | ||
1176 | + | ||
1177 | + | ||
1178 | + | ||
1152 | 1179 | | |
1153 | 1180 | | |
1154 | - | ||
1155 | - | ||
1181 | + | ||
1156 | 1182 | | |
1157 | - | ||
1183 | + | ||
1158 | 1184 | | |
1159 | - | ||
1185 | + | ||
1160 | 1186 | | |
1161 | 1187 | | |
1162 | 1188 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
444 | + | ||
445 | + | ||
446 | + | ||
447 | + | ||
448 | + | ||
449 | + | ||
450 | + | ||
451 | + | ||
452 | + | ||
453 | + | ||
454 | + | ||
455 | + | ||
456 | + | ||
457 | + | ||
458 | + | ||
459 | + | ||
460 | + | ||
461 | + | ||
462 | + | ||
463 | + | ||
464 | + | ||
465 | + | ||
466 | + | ||
467 | + | ||
468 | + | ||
469 | + | ||
470 | + | ||
471 | + | ||
472 | + | ||
473 | + | ||
474 | + | ||
475 | + | ||
476 | + | ||
477 | + | ||
478 | + | ||
479 | + | ||
480 | + | ||
481 | + | ||
482 | + | ||
483 | + | ||
484 | + | ||
485 | + | ||
486 | + | ||
487 | + | ||
488 | + | ||
489 | + | ||
490 | + | ||
491 | + | ||
492 | + | ||
493 | + | ||
494 | + | ||
495 | + | ||
496 | + | ||
497 | + | ||
498 | + | ||
499 | + | ||
500 | + | ||
501 | + | ||
502 | + | ||
503 | + | ||
504 | + | ||
505 | + | ||
506 | + | ||
507 | + | ||
508 | + | ||
509 | + | ||
510 | + | ||
511 | + | ||
512 | + | ||
513 | + | ||
514 | + | ||
515 | + | ||
516 | + | ||
517 | + | ||
518 | + | ||
519 | + | ||
520 | + | ||
521 | + | ||
522 | + | ||
523 | + | ||
444 | 524 | | |
445 | 525 | | |
446 | 526 | | |
| |||
0 commit comments