Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add improved performance on tons of whitespace #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
wooorm merged 5 commits into syntax-tree:main from 42shadow42:main
Jul 4, 2022

Conversation

@42shadow42
Copy link
Contributor

@42shadow42 42shadow42 commented Jun 25, 2022

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

Altered method of clearing out extraneous whitespace to handle non-matches more effectively. Approaches the problem using a regex with no backtracking to remove trailing whitespace, reverses the string and repeats to remove leading (now trailing) whitespace again. reverses the string to restore the original orientation.

I know it seems a little unorthodox to reverse the string, but because of how regex engines operate a leading whitespace regex like "[ \t]*$" runs inefficiently with lots of non-matching whitespace.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jun 25, 2022
Copy link
Contributor Author

Fixes #61

wooorm added a commit to wooorm/trim-lines that referenced this pull request Jun 25, 2022
Related-to: syntax-tree/mdast-util-to-hast#62.
Co-authored-by: Nathaniel Hunter <42shadow42@gmail.com>
wooorm added a commit to wooorm/trim-lines that referenced this pull request Jun 25, 2022
Related-to: syntax-tree/mdast-util-to-hast#62.
Co-authored-by: Nathaniel Hunter <42shadow42@gmail.com>
Copy link
Member

wooorm commented Jun 26, 2022

Just moving some of the discussion from wooorm/trim-lines@51434f7#commitcomment-76990903 and wooorm/trim-lines@321dc64#commitcomment-76991050 back here.

I remembered that I actually had a tiny utility that does what is also done here, that had the same performance problem, so I decided that it was probably a good idea to fix that utility, and reuse it here.

I indeed thought that it would probably be really slow to revere strings in some cases. And it looked a bit off. That’s why I came up with a regex, splitting on line endings, and then trimming each line.

I tested, with your test, that this new regex-way works exactly as fast as your reverse-string-way. You’re right that it would be good to also test some whitespace around line endings!

wooorm added a commit to wooorm/trim-lines that referenced this pull request Jun 26, 2022
Comment on lines 15 to 26
u(
'text',
String(node.value)
.replace(/^[\t]+/gm, '')
.split('')
.reverse()
.join('')
.replace(/^[\t]+/gm, '')
.split('')
.reverse()
.join('')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
u(
'text',
String(node.value)
.replace(/^[\t]+/gm, '')
.split('')
.reverse()
.join('')
.replace(/^[\t]+/gm, '')
.split('')
.reverse()
.join('')
)
u('text', trimLines(node.value))

Could you switch this PR to use trim-lines from npm as a dependency?

Copy link
Contributor Author

@42shadow42 42shadow42 Jun 26, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense to utilize an equivalent library that can be independently maintained, however before I make the change, I submitted a PR to address some edge cases that I identified, in the end the implementation is slower than 10ms on unfriendly input, but not more than 20ms. So it seems reasonable still.

wooorm reacted with thumbs up emoji
test/text.js Outdated
t.end()
})

test('Nodes Efficiency', (t) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('Nodes Efficiency', (t) => {
test('efficiency', (t) => {

test/text.js Outdated

test('Nodes Efficiency', (t) => {
const timeout = setTimeout(() => {
t.fail('should process lots of whitespace efficiently')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.fail('should process lots of whitespace efficiently')
t.fail('should process very fast')

test/text.js Outdated
Comment on lines 71 to 76

t.deepEqual(
toHast(u('text', `1${' '.repeat(70_000)}2`)),
u('text', `1${' '.repeat(70_000)}2`),
'should be efficient on excessive whitespace'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.deepEqual(
toHast(u('text', `1${' '.repeat(70_000)}2`)),
u('text', `1${' '.repeat(70_000)}2`),
'should be efficient on excessive whitespace'
)
const whitespace = ' '.repeat(70_000)
t.deepEqual(
toHast(u('text', '1' + whitespace + '2')),
u('text', '1' + whitespace + '2'),
'should be efficient on excessive whitespace'
)

test/text.js Outdated

t.end()
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are some ideas on cleaning this code.
Optionally: it can also be dropped, as this test (and more) are now in trim-lines!

Copy link
Contributor Author

@42shadow42 42shadow42 Jun 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll just drop this test, I'll make the change once the library has the final revision in place. See my PR for details. wooorm/trim-lines#5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Contributor Author

@wooorm This one is ready for rereview. It now just delegates the inefficiency to your trim-lines package.

wooorm reacted with hooray emoji

@wooorm wooorm changed the title (削除) Optimize whitespace removal on text nodes (削除ここまで) (追記) Add improved performance on tons of whitespace (追記ここまで) Jul 4, 2022
@wooorm wooorm merged commit 36a7da8 into syntax-tree:main Jul 4, 2022
@wooorm wooorm added 🏁 area/perf This affects performance 💪 phase/solved Post is done labels Jul 4, 2022

This comment has been minimized.

@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Jul 4, 2022
Copy link
Member

wooorm commented Jul 4, 2022

Thanks, released in 12.1.2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@wooorm wooorm wooorm approved these changes

Assignees

No one assigned

Labels

🏁 area/perf This affects performance 💪 phase/solved Post is done

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /