Make ArbitraryBase Unicode-aware #1299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

raklaptudirm merged 3 commits into TheAlgorithms:master from lionel-rowe:patch-1

Feb 23, 2023

Merged

Make ArbitraryBase Unicode-aware #1299

raklaptudirm merged 3 commits into TheAlgorithms:master from lionel-rowe:patch-1

Feb 23, 2023

Conversation

lionel-rowe

@lionel-rowe


 Make ArbitraryBase Unicode-aware

cff0f16

https://mathiasbynens.be/notes/javascript-unicode#counting-symbols

@lionel-rowe lionel-rowe requested review from raklaptudirm and appgurueu as code owners

February 22, 2023 03:42

appgurueu

appgurueu reviewed

Feb 22, 2023

View reviewed changes

Copy link

Collaborator

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".

@lionel-rowe

Copy link

Contributor Author

lionel-rowe commented Feb 22, 2023

The fix is sensible if Unicode is to be supported but it worsens the time complexity since .at is linear time in the size of the base string rather than constant time. Consider extracting the Unicode characters into an array as "preprocessing".

I'm not convinced that's true — surely accessing an array index is constant time, just like accessing an object property. There's no reason for at to be iterating over the array, as arr.at(n) acts identically to arr[n] for non-negative integer values of n. Maybe I should just swap those out for square-bracket accessors as they'll never be negative anyway.

The only linear time parts I added were these:

 const baseOneCharacters = [...baseOneCharacterString]
 const baseTwoCharacters = [...baseTwoCharacterString]

But they only run once per function call, and in any case, the first argument is already iterated prior to being reversed.

appgurueu

appgurueu previously approved these changes

Feb 22, 2023

View reviewed changes

Copy link

Collaborator

@appgurueu appgurueu left a comment •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, you're correct. Glancing over the changes I missed the spread operators and confused Array.at with String.at, of which I had incorrectly assumed that it was linear time (it is not, it's constant time - it's basically just charAt + negative index support).

Is there any particular reason to prefer x.at(y) over x[y] here though? y will always be positive, so I'd prefer standard array indexing here.

Additionally, it would be good if you could convert the "test comments" into proper Jest tests.

Thanks for the fix.

@appgurueu appgurueu added the fix label

Feb 22, 2023

@lionel-rowe


 Fix performance bug and add Unicode test

@raklaptudirm

Copy link

Member

raklaptudirm commented Feb 22, 2023

@appgurueu Should I wait for the changes you suggested or good to merge?

@lionel-rowe


 Add BigInt version and push output chars to array

6667e4c

@lionel-rowe lionel-rowe dismissed appgurueu’s stale review via

e3d6fcc

February 23, 2023 04:40

@lionel-rowe

Copy link

Contributor Author

lionel-rowe commented Feb 23, 2023

Changing baseTwoCharacters.at(remainder) to baseTwoCharacters[remainder] uncovered a performance bug — value was often not an integer, because the division result wasn't being floored. charAt and at both silently floor their arguments, which obscured the bug:

'abc'.charAt(1.5) // b
;[...'abc'].at(1.5) // b

However, as a result, the loop was being iterated over many more times than necessary as value < 1 got repeatedly divided until it reached floating-point 0, with all the extra zero-characters finally being lopped off at the end with .replace(new RegExp(`^${baseTwoZero}+`), ''). Without that regex replacement, convertArbitraryBase('98', '0123456789', '01234567') had 358 leading zeros (i.e. 358 unnecessary iterations).

I've fixed that by using floor division and removing the regex replacement.