Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 1caf028

Browse files
authored
Optimize raw HTML post-processor (#1510)
Don't precompute placeholder replacements in raw HTML post-processor. Fixes #1507. Previously, the raw HTML post-processor would precompute all possible replacements for placeholders in a string, based on the HTML stash. It would then apply a regular expression substitution using these replacements. Finally, if the text changed, it would recurse, and do all that again. This was inefficient because placeholders were re-computed each time it recursed, and because only a few replacements would be used anyway. This change moves the recursion into the regular expression substitution, so that: 1. the regular expression does minimal work on the text (contrary to re-scanning text already scanned in previous frames); 2. but more importantly, replacements aren't computed ahead of time anymore (and even less *several times*), and only fetched from the HTML stash as placeholders are found in the text. The substitution function relies on the regular expression groups ordering: we make sure to match `<p>PLACEHOLDER</p>` first, before `PLACEHOLDER`. The presence of a wrapping `p` tag indicates whether to wrap again the substitution result, or not (also depending on whether the substituted HTML is a block-level tag).
1 parent f6cfc5c commit 1caf028

File tree

2 files changed

+15
-26
lines changed

2 files changed

+15
-26
lines changed

‎docs/changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
* DRY fix in `abbr` extension by introducing method `create_element` (#1483).
1616
* Clean up test directory some removing some redundant tests and port
1717
non-redundant cases to the newer test framework.
18+
* Improved performance of the raw HTML post-processor (#1510).
1819

1920
### Fixed
2021

‎markdown/postprocessors.py

Lines changed: 14 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828

2929
from __future__ import annotations
3030

31-
from collections import OrderedDict
3231
from typing import TYPE_CHECKING, Any
3332
from . import util
3433
import re
@@ -73,37 +72,26 @@ class RawHtmlPostprocessor(Postprocessor):
7372

7473
def run(self, text: str) -> str:
7574
""" Iterate over html stash and restore html. """
76-
replacements = OrderedDict()
77-
for i in range(self.md.htmlStash.html_counter):
78-
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[i])
79-
if self.isblocklevel(html):
80-
replacements["<p>{}</p>".format(
81-
self.md.htmlStash.get_placeholder(i))] = html
82-
replacements[self.md.htmlStash.get_placeholder(i)] = html
83-
8475
def substitute_match(m: re.Match[str]) -> str:
85-
key = m.group(0)
86-
87-
if key not in replacements:
88-
if key[3:-4] in replacements:
89-
return f'<p>{ replacements[key[3:-4]] }</p>'
90-
else:
91-
return key
92-
93-
return replacements[key]
94-
95-
if replacements:
76+
if key := m.group(1):
77+
wrapped = True
78+
else:
79+
key = m.group(2)
80+
wrapped = False
81+
if (key := int(key)) >= self.md.htmlStash.html_counter:
82+
return m.group(0)
83+
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[key])
84+
if not wrapped or self.isblocklevel(html):
85+
return pattern.sub(substitute_match, html)
86+
return pattern.sub(substitute_match, f"<p>{html}</p>")
87+
88+
if self.md.htmlStash.html_counter:
9689
base_placeholder = util.HTML_PLACEHOLDER % r'([0-9]+)'
9790
pattern = re.compile(f'<p>{ base_placeholder }</p>|{ base_placeholder }')
98-
processed_text= pattern.sub(substitute_match, text)
91+
return pattern.sub(substitute_match, text)
9992
else:
10093
return text
10194

102-
if processed_text == text:
103-
return processed_text
104-
else:
105-
return self.run(processed_text)
106-
10795
def isblocklevel(self, html: str) -> bool:
10896
""" Check is block of HTML is block-level. """
10997
m = self.BLOCK_LEVEL_REGEX.match(html)

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /