This repository was archived by the owner on Aug 5, 2024. It is now read-only.
 
 
 - 
  Notifications
 
You must be signed in to change notification settings  - Fork 1.2k
 
Commit 50f1542
Python3: Stop breaking surrogate pairs in toDelta()
Resolves #69 for Python3
Sometimes we can find a common prefix that runs into the middle of a
surrogate pair and we split that pair when building our diff groups.
This is fine as long as we are operating on UTF-16 code units. It
becomes problematic when we start trying to treat those substrings as
valid Unicode (or UTF-8) sequences.
When we pass these split groups into `toDelta()` we do just that and the
library crashes. In this patch we're post-processing the diff groups
before encoding them to make sure that we un-split the surrogate pairs.
The post-processed diffs should produce the same output when applying
the diffs. The diff string itself will be different but should change
that much - only by a single character at surrogate boundaries.1 parent db1cbba commit 50f1542
2 files changed
+82
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
 | |||
26 | 26 |  | |
27 | 27 |  | |
28 | 28 |  | |
29 | + | ||
29 | 30 |  | |
30 | 31 |  | |
31 | 32 |  | |
 | |||
1147 | 1148 |  | |
1148 | 1149 |  | |
1149 | 1150 |  | |
1151 | + | ||
1152 | + | ||
1153 | + | ||
1150 | 1154 |  | |
1151 | 1155 |  | |
1152 | 1156 |  | |
1153 | 1157 |  | |
1154 | 1158 |  | |
1155 | - | ||
1159 | + | ||
1156 | 1160 |  | |
1157 | - | ||
1161 | + | ||
1158 | 1162 |  | |
1159 | 1163 |  | |
1160 | 1164 |  | |
 | |||
1172 | 1176 |  | |
1173 | 1177 |  | |
1174 | 1178 |  | |
1175 | - | ||
1179 | + | ||
1180 | + | ||
1176 | 1181 |  | |
1177 | 1182 |  | |
1178 | 1183 |  | |
 | |||
1191 | 1196 |  | |
1192 | 1197 |  | |
1193 | 1198 |  | |
1194 | - | ||
1195 | - | ||
1199 | + | ||
1200 | + | ||
1196 | 1201 |  | |
1197 | 1202 |  | |
1198 | 1203 |  | |
 | |||
1201 | 1206 |  | |
1202 | 1207 |  | |
1203 | 1208 |  | |
1204 | - | ||
1209 | + | ||
1205 | 1210 |  | |
1206 | 1211 |  | |
1207 | - | ||
1212 | + | ||
1208 | 1213 |  | |
1209 | 1214 |  | |
1210 | 1215 |  | |
 | |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
 | |||
18 | 18 |  | |
19 | 19 |  | |
20 | 20 |  | |
21 | + | ||
21 | 22 |  | |
22 | 23 |  | |
23 | 24 |  | |
 | |||
444 | 445 |  | |
445 | 446 |  | |
446 | 447 |  | |
448 | + | ||
449 | + | ||
450 | + | ||
451 | + | ||
452 | + | ||
453 | + | ||
447 | 454 |  | |
448 | 455 |  | |
449 | 456 |  | |
 | |||
455 | 462 |  | |
456 | 463 |  | |
457 | 464 |  | |
465 | + | ||
466 | + | ||
467 | + | ||
468 | + | ||
469 | + | ||
470 | + | ||
471 | + | ||
472 | + | ||
473 | + | ||
474 | + | ||
475 | + | ||
476 | + | ||
477 | + | ||
478 | + | ||
479 | + | ||
480 | + | ||
481 | + | ||
482 | + | ||
483 | + | ||
484 | + | ||
485 | + | ||
486 | + | ||
487 | + | ||
488 | + | ||
489 | + | ||
490 | + | ||
491 | + | ||
492 | + | ||
493 | + | ||
494 | + | ||
495 | + | ||
496 | + | ||
497 | + | ||
498 | + | ||
499 | + | ||
500 | + | ||
501 | + | ||
502 | + | ||
503 | + | ||
504 | + | ||
505 | + | ||
506 | + | ||
507 | + | ||
508 | + | ||
509 | + | ||
510 | + | ||
511 | + | ||
512 | + | ||
513 | + | ||
514 | + | ||
515 | + | ||
516 | + | ||
517 | + | ||
518 | + | ||
519 | + | ||
520 | + | ||
521 | + | ||
522 | + | ||
523 | + | ||
524 | + | ||
525 | + | ||
526 | + | ||
527 | + | ||
458 | 528 |  | |
459 | 529 |  | |
460 | 530 |  | |
 | |||
0 commit comments