3
\$\begingroup\$

Background:

In collecting data we wanted associate each item of data with individuals who contributed it but without actually identifying them individually. To do this we used MD5(Environ("UserName")) and stored this 'ID' on each of the individual contributors PCs. The unforeseen problem that thwarted our plan was that under some conditions windows changes the case of Environ("UserName"), probably depending on how the user logs in. This incorrectly flagged that the user was unregistered and prevented them from contributing.

Having learned from our mistake we now use lowercase version of the UserName to anonymize respondents: MD5(lCase(Environ("UserName"))).

But we still need to re-validate each user who was flagged as un-registered and rewrite their MD5 IDs. Part of this strategy means checking the current MD5 is in-fact for this user, i.e. systematically change case until a match is found then rewrite the MD5 based on lCase(UserName); this is the crux of the question.

Problem:

I have a need to find all possible combinations of a string between lCase(string) and uCase(string) - without rearranging characters. For example, given the input string of: abc I would expect the code to return: abc, abC, aBc, aBC, Abc, AbC, ABc, & ABC.

The following code does that, but perhaps not as efficiently as it could. I'm working on the following assumptions:

  • the input characters will be limited to:
    • upper and lower case characters, A-z
    • regular numbers, 0-9
    • regular space character, " "
  • the input strings are relatively short, 1 to 20 characters
  • Users typically log into Windows using:
    • FirstMiddleLast,
    • FLast
    • FMLast
    • FirstL, etc.
    • (but other combinations and cases are also possible & likely)

The code basically works by converting an input string to lower case and then into a byte array. This provides each character in the input string with an easy-to-access index.

The next step is to create a binary number with 1 digit for each character in the input string. Then starting from 000 (i.e. for abc) use a loop to keep adding 1 (to a decimal number and converting) to binary until there are no zeros left in the binary number. For each iteration of the loop the "1's" in the binary number represent possible uppercase characters for the original input string.

On my test PC it takes approximately 60 seconds to process a 16 character input string. I'm sure there is room for improvement.

In addition to speed I would like to better structure the code so that it is easier to test and so that when I look at it later I'm not left cursing myself.

The code is as follows:

Option Explicit
Const BASE_2 As Long = 2
Const CHR_1 As String = "1"
Const CHR_0 As String = "0"
Public Sub CasePermutation(ByVal stringPassed As String)
 Dim stringIn As String
 stringIn = LCase$(stringPassed)
 Const SKIP_BLANKS As Long = 2
 Dim stringInArray() As Byte
 stringInArray = stringIn 'Assign Unicode string to bytes.'
 Dim caseToggleArray() As Byte
 Dim exponent As Long
 exponent = Len(stringIn)
 Dim maxDecimal As Long
 maxDecimal = BASE_2 ^ exponent
 Dim decimalNumber As Long
 For decimalNumber = 0 To maxDecimal - 1
 caseToggleArray = DecToBin(decimalNumber, exponent)
 Dim positionInString As Long
 For positionInString = LBound(stringInArray) To UBound(stringInArray) Step SKIP_BLANKS
 Dim tempString As String
 If Chr$(caseToggleArray(positionInString)) = CHR_1 Then
 tempString = tempString & UCase$(Chr$(stringInArray(positionInString)))
 Else
 tempString = tempString & Chr$(stringInArray(positionInString))
 End If
 Next positionInString
 Debug.Print tempString
 ' Reset for next iteration
 tempString = vbNullString
 Next decimalNumber
End Sub
Public Function DecToBin(ByVal decimalPassed As Long, ByVal digitsToCount As Long) As String
 Dim curDec As Long
 curDec = decimalPassed
 Dim strTemp As String
 Dim positionInString As Long
 positionInString = digitsToCount
 Do While positionInString >= 0
 If curDec >= (BASE_2 ^ positionInString) Then
 strTemp = strTemp & CHR_1
 curDec = curDec - (BASE_2 ^ positionInString)
 Else
 If positionInString <> digitsToCount Then
 strTemp = strTemp & CHR_0
 End If
 End If
 positionInString = positionInString - 1
 Loop
 DecToBin = strTemp
End Function
Public Sub test()
 Dim myNow As Date
 myNow = now()
 CasePermutation "a 1"
 Debug.Print now & " " & myNow ' rough but ok for testing with 16 chr input strings
End Sub
asked Aug 31, 2017 at 9:18
\$\endgroup\$
0

2 Answers 2

2
\$\begingroup\$

The main reason for the slowdown is the repeated "Debug.Print" command. Store your results in an array instead, and "print" it to a worksheet. Just by making that change, the time elapsed was reduced to 1.2 seconds on my computer.

There are a few other slowdowns (like using the "Chr$()" function twice every time through the inner loop) but nothing major.

My reworked version processed a 16-character string + printed the results in 0.6 seconds on my computer, approximately twice as fast as the slightly reworked original code. I find it somewhat easier to read (mostly due to avoiding the Byte arrays) but that may just be personal preference. The logic is similar in both.

EDIT: Slightly modified to avoid use of Application.Transpose, which caused errors with>16 characters in string.

Sub casePermutation(ByVal str As String, destWS As Worksheet)
 'Loop through characters in string and record whether they are "flippable"
 '(ie whether they are a letter from a-z)
 ReDim flippable(1 To Len(str), 1 To 2) As Variant
 str = LCase(str)
 Dim numLetters As Long
 Dim i As Long
 For i = 1 To Len(str)
 flippable(i, 1) = Mid(str, i, 1)
 Dim aVal As Long
 aVal = Asc(flippable(i, 1))
 If aVal >= 97 And aVal <= 122 Then
 flippable(i, 2) = True
 numLetters = numLetters + 1
 Else
 flippable(i, 2) = False
 End If
 Next
 'Alert user if character limit has been exceeded
 If numLetters > 20 Then
 MsgBox "Error: Function only supports up to 20 ""flippable"" letters"
 Stop
 Exit Sub
 End If
 'Fill array of permutations
 ReDim resultsArr(1 To 2 ^ numLetters, 1 To 1) As String
 Dim toRepeat As Long
 toRepeat = 1
 For i = 1 To Len(str)
 Dim uBool As Boolean
 Dim repeatCounter As Long
 Dim j As Long
 uBool = False
 repeatCounter = 0
 For j = 1 To UBound(resultsArr, 1)
 If flippable(i, 2) = True Then
 If repeatCounter >= toRepeat Then
 uBool = Not uBool
 repeatCounter = 0
 End If
 If uBool = False Then
 resultsArr(j, 1) = resultsArr(j, 1) & flippable(i, 1)
 Else
 resultsArr(j, 1) = resultsArr(j, 1) & UCase(flippable(i, 1))
 End If
 repeatCounter = repeatCounter + 1
 Else
 resultsArr(j, 1) = resultsArr(j, 1) & flippable(i, 1)
 End If
 Next
 If flippable(i, 2) = True Then
 toRepeat = toRepeat * 2
 End If
 Next
 'Paste results to destination sheet
 With destWS
 .Range(.Cells(1, 1), .Cells(UBound(resultsArr), 1)) = resultsArr
 End With
End Sub
answered Aug 31, 2017 at 20:11
\$\endgroup\$
11
  • \$\begingroup\$ Thanks for that. I'll have a look and get back to you 😎 \$\endgroup\$ Commented Aug 31, 2017 at 20:34
  • \$\begingroup\$ Nice. There is 1 big difference in the logic - your code is processing the permutations concurrently whereas mine is processing the permutations sequentially. This is something that I never anticipated and it is really cool to see how other people approach problems differently; and educational too. Now I see the need to be more specific with stating requirements (and I guess seeking clarification). \$\endgroup\$ Commented Aug 31, 2017 at 23:20
  • \$\begingroup\$ I did notice your code generates Error 13 for strings larger than 16 characters: casePermutation "abcdefghijklmnop", activeworkbook.Sheets(1) on this line .Range(.Cells(1, 1), .Cells(UBound(resultsArr), 1)) = Application.Transpose(resultsArr). I didn't test the impact of non-flippable characters. \$\endgroup\$ Commented Aug 31, 2017 at 23:20
  • \$\begingroup\$ I'll update my question to provide some more clarity as to why I am looking to generate the permutations sequentially - basically it is so I can exit the code if I find a match with text in another location. E.g. If EncodePermutation = someString Then Stop. I think this forces the sequential approach as (unstated in the OP) each permutation is encoded and checked against a string. \$\endgroup\$ Commented Aug 31, 2017 at 23:26
  • 1
    \$\begingroup\$ In case you're interested, see initial guesses below. Many thanks for your help :-D \$\endgroup\$ Commented Sep 2, 2017 at 0:40
1
\$\begingroup\$

About this Answer

Thanks to Daniel McCracken for his thoughts on this. Daniel's answer was a great solution to the original question; hence his is the accepted answer.

This answer suits my needs better but those needs could not have been guessed from the original question. However...

After considering Daniel's proposal I realized that it might be appropriate to make some guesses before launching into what was essentially a Brute Force approach to solving my problem which is essentially to generate all possible Upper/Lower case permutations of a username and compare that to an encrypted string.

Declarations

Option Explicit
Const BASE_2 As Long = 2
Const CHR_1 As String = "1"
Const CHR_0 As String = "0"

Fake MD5

For the purpose of testing I created FakeMD5:

Function fakeMD5(ByVal inputString As String) As String
 Dim i As Long
 Dim c As String
 Dim outputString As String
 For i = 1 To Len(inputString)
 c = Mid(inputString, i, 1)
 If UCase(c) = c Then
 outputString = outputString & CHR_1
 Else
 outputString = outputString & CHR_0
 End If
 Next i
 fakeMD5 = outputString
End Function

FakeMD5 generates a string of '0' and '1' to represent upper and lower case characters in the UserName (e.g. UserName => 10001000).

For some tests Daniel's code blew mine out of the water - I was surprised because he generated string permutations concurrently so all permutations have to be generated before any can be tested. On the other hand my code generates permutations sequentially allowing testing to be done sooner for some permutations.

Initial Tweak to DecToBin

Looking at Daniels code it became obvious that his code was producing strings that would likely match real usernames before my code would. This realization prompted two slight modifications in the method DecToBin (which incidentally mandates a rename):

 If curDec >= (BASE_2 ^ positionInString) Then
 'strTemp = CHR_1 * strTemp
 strTemp = strTemp & CHR_1 ' <= Modified
 curDec = curDec - (BASE_2 ^ positionInString)
 Else
 If positionInString <> digitsToCount Then
 'strTemp = CHR_0 & strTemp
 strTemp = strTemp & CHR_0 ' <= Modified
 End If
 End If

This change meant the binary string was reversed and therefore would match a normal UserName sooner (i.e. string is built like this: 10000000, 01000000, 11000000 instead of 00000001, 00000010, 00000011).

[EDIT: Overhaul of DecToBin] (still needs a name change)

After posting this answer some more thought went into DecToBin... essentially the problem requires adding 1 to a binary number... there is no need to recalculate the binary number so it is declared as Static.

The DecToBin function was replaced with the following code to yield a better than 30% improvement in overall speed:

Public Function DecToBin(ByVal decimalPassed As Long, ByVal digitsToCount As Long) As String
 Static binaryR As String
 Static zeros As String
 Dim iPos As Long
 Dim iZero As Long
 Select Case True
 Case decimalPassed = 1, 0
 binaryR = vbNullString
 For iPos = 1 To digitsToCount
 binaryR = CHR_0 & binaryR
 Next iPos
 zeros = binaryR
 binaryR = CHR_1 & binaryR
 Case Mid$(binaryR, 1, 1) = CHR_0
 Mid$(binaryR, 1, 1) = CHR_1
 Case InStr(binaryR, CHR_0) = 0
 ' this is the end
 For iPos = 1 To digitsToCount
 binaryR = CHR_0 & binaryR
 Next iPos
 Case Else
 iPos = InStr(binaryR, CHR_0)
 Mid$(binaryR, iPos, 1) = CHR_1
 Dim iCount As Long
 For iCount = 1 To iPos
 Mid$(binaryR, iCount, 1) = CHR_0
 Next iCount
 End Select
 DecToBin = binaryR
End Function

Note: An attempt was made to eliminate the For Loop in favor of using Mid$() (code below was proposed rewrite of Dec2Bin with name change) but it turns out the Mid$() was slower than using a loop. The test strings for this were all 1 lowercase character followed but 'n' uppercase characters.

Function binaryAddOne() As String ' Expected to be fast, but was slow NOT Used (see results: 'Old Shift')
 Static zeros As String
 Static binin As String
 Dim iPos As Long
 iPos = InStr(1, binin, CHR_0)
 Select Case True
 Case binin = vbNullString
 zeros = Replace$(Space$(30), " ", CHR_0)
 binin = Replace$(zeros, CHR_0, CHR_1, 1, 1)
 Case CBool(iPos)
 Mid$(binin, iPos, 1) = CHR_1
 Mid$(binin, 1, iPos - 1) = Left$(zeros, iPos - 1)
 End Select
 binaryAddOne = binin
 'Debug.Print binin
End Function

In the image below times are in seconds (calculated (Now(End) - Now(start)) * 10 ^ 5 , "Loop" is the time for the original code (as in OP), and "Shift" is the time for adding +1 using BinaryAddOne() (instead of recalculating the entire binary number). Surprisingly "Old Shift" was the result when using Mid$() to replace zeros instead of looping through the characters to replace zeros.

enter image description here

Opportunity for CasePermutation

Nothing was changed in the method CasePermutation.

There might be some opportunity to do away with the Byte array by using the builtin Mid function to modify strings.

Initial Guesses

The Mid function is used in the newly added InitialGuesses method, code:

Private Function InitialGuesses(ByVal stringIn As String, ByVal encrypted As String) As String
 Dim tempString As String
 Dim i As Long
 tempString = LCase(stringIn)
 Dim testString As String
 ' All UpperCase
 If fakeMD5(UCase(tempString)) = encrypted Then
 InitialGuesses = UCase(tempString)
 GoTo foundResult
 End If
 ' Only letter is Cap
 For i = 1 To Len(stringIn)
 'testString = Left(tempString, i - 1) & UCase(Mid(tempString, i, 1)) & Right(tempString, Len(tempString) - i)
 testString = tempString
 Mid(testString, i, 1) = UCase(Mid(tempString, i, 1))
 If fakeMD5(testString) = encrypted Then
 InitialGuesses = UCase(testString)
 GoTo foundResult
 End If
 Next i
 ' First letter + 1 other letter is Cap
 For i = 2 To Len(tempString)
 testString = tempString
 Mid(testString, 1, 1) = UCase(Mid(tempString, 1, 1))
 Mid(testString, i, 1) = UCase(Mid(testString, i, 1))
 If fakeMD5(testString) = encrypted Then
 InitialGuesses = testString
 GoTo foundResult
 End If
 Next i
 ' First letter + 2 consecutive letters are Cap
 For i = 2 To Len(tempString) - 1
 testString = tempString
 Mid(testString, 1, 1) = UCase(Mid(tempString, 1, 1))
 Mid(testString, i, 2) = UCase(Mid(testString, i, 2))
 If fakeMD5(testString) = encrypted Then
 InitialGuesses = testString
 GoTo foundResult
 End If
 Next i
 ' First letter + 2 letters separated by a space (XxxxxXxX) are Cap
 For i = 2 To Len(tempString) - 2
 testString = tempString
 Mid(testString, 1, 1) = UCase(Mid(tempString, 1, 1))
 Mid(testString, i, 1) = UCase(Mid(testString, i, 1))
 Mid(testString, i + 2, 1) = UCase(Mid(testString, i + 2, 1))
 If fakeMD5(testString) = encrypted Then
 InitialGuesses = testString
 GoTo foundResult
 End If
 Next i
foundResult:
End Function

Initial guesses is designed to quickly test likely combinations of case permutations based on typical UserNames. The last two guesses were included for people with surnames like OBrien and McCracken (or with initials in their names like VerySLearner).

Testing the Code

The whole thing is driven by this test method:

FakeMD5 creates a string that for all intents and purposes serves as our unique ID for a given user - it is generated from myTestString and is used to compare each permutation against.

Compare the speed of finding a match for a very long first name using initial guesses and brute force by commenting out the line: stringFound = InitialGuesses(myTestString, encrypted) (word of warning this could take a while, hold Ctrl+Esc before pressing Ctrl+Break works for me, you could add DoEvents in the main For loop of CasePermutation)

Sub test()
 Dim myNow As Date
 Dim myNow2 As Date
 Const myTestString As String = "Daniel_has_a_longnameMcCracken" ' Change this for testing 
 Dim encrypted As String
 encrypted = fakeMD5(myTestString)
 Debug.Print "Encrypted by fakeMD5: " & encrypted & " (" & Len(encrypted) & ")"
 Dim stringFound As String
 myNow = Now()
 Debug.Print "Attempting initial guesses..."
 stringFound = InitialGuesses(myTestString, encrypted)
 If stringFound <> vbNullString Then
 Debug.Print stringFound
 GoTo foundID
 End If
 Debug.Print "Initial guesses failed, attempting brute force..."
 stringFound = CasePermutation(myTestString, encrypted)
 If stringFound <> vbNullString Then
 Debug.Print stringFound
 GoTo foundID
 End If
foundID:
 Debug.Print "v1 " & (Now - myNow) * 10 ^ 5 ' rough estimate of time
 Exit Sub
End Sub
answered Sep 2, 2017 at 0:31
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.