-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Make ArbitraryBase Unicode-aware #1299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at
is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".
The fix is sensible if Unicode is to be supported but it worsens the time complexity since
.at
is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".
I'm not convinced that's true — surely accessing an array index is constant time, just like accessing an object property. There's no reason for at
to be iterating over the array, as arr.at(n)
acts identically to arr[n]
for non-negative integer values of n
. Maybe I should just swap those out for square-bracket accessors as they'll never be negative anyway.
The only linear time parts I added were these:
const baseOneCharacters = [...baseOneCharacterString] const baseTwoCharacters = [...baseTwoCharacterString]
But they only run once per function call, and in any case, the first argument is already iterated prior to being reversed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, you're correct. Glancing over the changes I missed the spread operators and confused Array.at with String.at, of which I had incorrectly assumed that it was linear time (it is not, it's constant time - it's basically just charAt
+ negative index support).
Is there any particular reason to prefer x.at(y)
over x[y]
here though? y
will always be positive, so I'd prefer standard array indexing here.
Additionally, it would be good if you could convert the "test comments" into proper Jest tests.
Thanks for the fix.
@appgurueu Should I wait for the changes you suggested or good to merge?
Changing baseTwoCharacters.at(remainder)
to baseTwoCharacters[remainder]
uncovered a performance bug — value
was often not an integer, because the division result wasn't being floored. charAt
and at
both silently floor their arguments, which obscured the bug:
'abc'.charAt(1.5) // b ;[...'abc'].at(1.5) // b
However, as a result, the loop was being iterated over many more times than necessary as value < 1
got repeatedly divided until it reached floating-point 0
, with all the extra zero-characters finally being lopped off at the end with .replace(new RegExp(`^${baseTwoZero}+`), '')
. Without that regex replacement, convertArbitraryBase('98', '0123456789', '01234567')
had 358 leading zeros (i.e. 358 unnecessary iterations).
I've fixed that by using floor division and removing the regex replacement.
e3d6fcc
to
9103435
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work.
https://mathiasbynens.be/notes/javascript-unicode#counting-symbols
Open in Gitpod know more
Describe your change:
Checklist:
Example:
UserProfile.js
is allowed butuserprofile.js
,Userprofile.js
,user-Profile.js
,userProfile.js
are notFixes: #{$ISSUE_NO}
.