15
\$\begingroup\$

I'm implementing a variation of the SuperFastHash in VBA for use in Excel (32-bit version, so no LongLong available) to hash strings.

To get around the limitations of signed 32-bit Long values, I'm doing the addition and bit-shifting using Double types, and then converting from Double to Long in a way that truncates it at 31 bits (the maximum positive value -- don't want to deal with two's complement and signs).

I'm getting answers and avoiding overflows so far, but I have a suspicion I'm making some mistakes in translation, since most implementations use all 32 bits of a uint and also deal with individual bytes from an array rather than 16-bit values coming from AscW().

Any suggestions of improvement, especially of the string-handling, bit-shifting, or the final avalanche?

Public Function shr(ByVal Value As Long, ByVal Shift As Byte) As Long
 shr = Value
 If Shift > 0 Then shr = shr \ (2 ^ Shift)
End Function
Public Function shl(ByVal Value As Long, ByVal Shift As Byte) As Long
 If Shift > 0 Then
 shl = LimitDouble(CDbl(Value) * (2& ^ Shift))
 Else
 shl = Value
 End If
End Function
Public Function LimitDouble(ByVal d As Double) As Long
 '' Prevent overflow by lopping off anything beyond 31 bits
 Const MaxNumber As Double = 2 ^ 31
 LimitDouble = CLng(d - (Fix(d / MaxNumber) * MaxNumber))
End Function
Public Function SuperFastHash(ByVal dataToHash As String) As Long
 Dim dataLength As Long
 dataLength = Len(dataToHash)
 If (dataLength = 0) Then
 SuperFastHash = 0
 Exit Function
 End If
 Dim hash As Long
 hash = dataLength
 Dim remainingBytes As Integer
 remainingBytes = dataLength Mod 2
 Dim numberOfLoops As Integer
 numberOfLoops = dataLength \ 2
 Dim currentIndex As Integer
 currentIndex = 0
 Dim tmp As Double
 Do While (numberOfLoops > 0)
 hash = LimitDouble(CDbl(hash) + AscW(Mid$(dataToHash, currentIndex + 1, 1)))
 tmp = shl(AscW(Mid$(dataToHash, currentIndex + 2, 1)), 11) Xor hash
 hash = shl(hash, 16) Xor tmp
 hash = LimitDouble(CDbl(hash) + shr(hash, 11))
 currentIndex = currentIndex + 2
 numberOfLoops = numberOfLoops - 1
 Loop
 If remainingBytes = 1 Then
 hash = LimitDouble(CDbl(hash) + AscW(Mid$(dataToHash, currentIndex + 1, 1)))
 hash = hash Xor shl(hash, 10)
 hash = LimitDouble(CDbl(hash) + shr(hash, 1))
 End If
 '' Final avalanche
 hash = hash Xor shl(hash, 3)
 hash = LimitDouble(CDbl(hash) + shr(hash, 5))
 hash = hash Xor shl(hash, 4)
 hash = LimitDouble(CDbl(hash) + shr(hash, 17))
 hash = hash Xor shl(hash, 25)
 hash = LimitDouble(CDbl(hash) + shr(hash, 6))
 SuperFastHash = hash
End Function
ChrisWue
20.6k4 gold badges42 silver badges107 bronze badges
asked Oct 9, 2012 at 22:13
\$\endgroup\$
2
  • \$\begingroup\$ Have you considered using Decimal data type? From excel help Decimal variables are stored as 96-bit (12-byte) signed integers scaled by a variable power of 10. Decimal can store 32bit or 64 bit unsigned integers \$\endgroup\$ Commented Oct 10, 2012 at 9:44
  • \$\begingroup\$ Doubles perform better than Decimal, and the Decimal type stores numbers in a way that wouldn't work with the AND and XOR operations, so I still have to thunk them back and forth to a 32-bit integer of some sort for the computation. \$\endgroup\$ Commented Oct 10, 2012 at 13:51

3 Answers 3

7
\$\begingroup\$

This is quite possibly the most clever VBA code I have ever seen.

There's not much room for improvement here, except a couple naming nitpicks:

  • If that's what they stand for, functions shr and shl could afford to be called ShiftRight and ShiftLeft.
  • I don't see a reason for shr and shl to not follow the PascalCasing naming convention.
  • Given the almost-flawlessly consistent camelCasing convention for locals & parameters, I don't see a reason for Value and Shift parameters to not stick to it. (except perhaps VBA/VB6's stubornness with casing?)

As for string handling, as per this source AscW is faster than Asc so again, good call:

AscW(Mid$(..)) considerations

Don't copy too many characters with Mid$. AscW only examines the first character anyway. AscW(Mid$(s, x, 1)) is the best call. Note that if you call AscW(Mid$(s, x)) without the third parameter, Mid$ executes slowly when s is a long string.

Example of potentially slow code:

For x = 1 To Len(s)
 If AscW(Mid$(s, x)) = ... Then ... 
Next

The above is better written as:

For x = 1 To Len(s)
 If AscW(Mid$(s, x, 1)) = ... Then ... 
Next
answered Nov 16, 2013 at 6:59
\$\endgroup\$
6
\$\begingroup\$

Nothing major, just some stylistic comments....

  • The Do While loop can be rewritten more clearly as a For i = 0 to dataLength \ 2

  • the variable remainingBytes is unnecessary, and if used should be a boolean not an integer.

  • tmp can be declared inside the loop as it is never used outside it.

Malachi
29k11 gold badges86 silver badges188 bronze badges
answered Nov 17, 2013 at 7:13
\$\endgroup\$
2
  • \$\begingroup\$ I find a better name for a Boolean would be hasRemainingBytes - remainingBytes does sound like an integer to me, although I agree with your point. Also in VBA declaring variables inside a loop affects readability negatively imho - VBA doesn't scope variables at that level. \$\endgroup\$ Commented Nov 17, 2013 at 20:52
  • 1
    \$\begingroup\$ @retailcoder: I'd prefer to just get rid of remaingBytes, but if not changing the name as well would be good. As for tmp and scope, I think VBA gives you a choice of evils, and putting it where it should be if there was a good choice is the best you can do in this case. Fortunately, the value truly is temporary in this case, assigned and then read just once per iteration. Declaring it outside of the loop gives the impression that it will be used outside of the loop, when the truth is it could be, but isn't, no matter where it is declared. \$\endgroup\$ Commented Nov 18, 2013 at 5:56
5
\$\begingroup\$

in your SuperFastHash function you are Dimming and setting variables and then using them in a fashion that looks messy to me even though it is in a logical order.

the set up should be separate from the actual logic of the code to make it more clear and readable.

you should Dim all the variables at the beginning of the function and then follow with the logic so that it the flow is clear and understandable

it should look like this:

Public Function SuperFastHash(ByVal dataToHash As String) As Long
 Dim dataLength As Long
 Dim hash As Long
 Dim remainingBytes As Integer
 Dim numberOfLoops As Integer
 Dim currentIndex As Integer
 Dim tmp As Double
 dataLength = Len(dataToHash)
 If (dataLength = 0) Then
 SuperFastHash = 0
 Exit Function
 End If
 hash = dataLength 
 remainingBytes = dataLength Mod 2
 numberOfLoops = dataLength \ 2
 currentIndex = 0

this looks much clearer about what is going on and is a good coding practice

(as I was writing this I thought this was going to be the only answer, but then retailcoder posted an answer)

answered Nov 16, 2013 at 7:20
\$\endgroup\$
3
  • \$\begingroup\$ +1 for finding something to say on that one - although I didn't mention this point because I felt (with the camelCasing of locals) the programmer came from a C or C# background, where the norm is more to declare as close as possible to usage. But I agree in short VBA/VB6 functions it does look cleaner that way. \$\endgroup\$ Commented Nov 16, 2013 at 7:23
  • \$\begingroup\$ I mostly Code in C# and I disagree with declare as close as possible to usage if it isn't super huge then I declare at the beginning, and never staggered like that. \$\endgroup\$ Commented Nov 16, 2013 at 7:31
  • 1
    \$\begingroup\$ I cut my teeth in BASIC, followed by QB, VBA, etc., but over the last 10 years or so I spend most of my time in C#, where variables are more traditionally declared before first use. I struggle with being consistent here between C#, VBA, JS, T-SQL, and other languages I switch among during the day. Advantage for top-declaration is clarity of what variables are needed. Advantage of JIT declaration is better control over when a variable should be useable. I think you're right in this case -- the function is short enough that top declaration would be cleaner. \$\endgroup\$ Commented Feb 18, 2017 at 1:54

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.