Background:
In collecting data we wanted associate each item of data with individuals who contributed it but without actually identifying them individually. To do this we used MD5(Environ("UserName"))
and stored this 'ID' on each of the individual contributors PCs. The unforeseen problem that thwarted our plan was that under some conditions windows changes the case of Environ("UserName")
, probably depending on how the user logs in. This incorrectly flagged that the user was unregistered and prevented them from contributing.
Having learned from our mistake we now use lowercase version of the UserName to anonymize respondents: MD5(lCase(Environ("UserName")))
.
But we still need to re-validate each user who was flagged as un-registered and rewrite their MD5 IDs. Part of this strategy means checking the current MD5 is in-fact for this user, i.e. systematically change case until a match is found then rewrite the MD5 based on lCase(UserName)
; this is the crux of the question.
Problem:
I have a need to find all possible combinations of a string between lCase(string)
and uCase(string)
- without rearranging characters. For example, given the input string of: abc
I would expect the code to return: abc, abC, aBc, aBC, Abc, AbC, ABc, & ABC
.
The following code does that, but perhaps not as efficiently as it could. I'm working on the following assumptions:
- the input characters will be limited to:
- upper and lower case characters, A-z
- regular numbers, 0-9
- regular space character, " "
- the input strings are relatively short, 1 to 20 characters
- Users typically log into Windows using:
- FirstMiddleLast,
- FLast
- FMLast
- FirstL, etc.
- (but other combinations and cases are also possible & likely)
The code basically works by converting an input string to lower case and then into a byte array. This provides each character in the input string with an easy-to-access index.
The next step is to create a binary number with 1 digit for each character in the input string. Then starting from 000
(i.e. for abc
) use a loop to keep adding 1 (to a decimal number and converting) to binary until there are no zeros left in the binary number. For each iteration of the loop the "1's" in the binary number represent possible uppercase characters for the original input string.
On my test PC it takes approximately 60 seconds to process a 16 character input string. I'm sure there is room for improvement.
In addition to speed I would like to better structure the code so that it is easier to test and so that when I look at it later I'm not left cursing myself.
The code is as follows:
Option Explicit
Const BASE_2 As Long = 2
Const CHR_1 As String = "1"
Const CHR_0 As String = "0"
Public Sub CasePermutation(ByVal stringPassed As String)
Dim stringIn As String
stringIn = LCase$(stringPassed)
Const SKIP_BLANKS As Long = 2
Dim stringInArray() As Byte
stringInArray = stringIn 'Assign Unicode string to bytes.'
Dim caseToggleArray() As Byte
Dim exponent As Long
exponent = Len(stringIn)
Dim maxDecimal As Long
maxDecimal = BASE_2 ^ exponent
Dim decimalNumber As Long
For decimalNumber = 0 To maxDecimal - 1
caseToggleArray = DecToBin(decimalNumber, exponent)
Dim positionInString As Long
For positionInString = LBound(stringInArray) To UBound(stringInArray) Step SKIP_BLANKS
Dim tempString As String
If Chr$(caseToggleArray(positionInString)) = CHR_1 Then
tempString = tempString & UCase$(Chr$(stringInArray(positionInString)))
Else
tempString = tempString & Chr$(stringInArray(positionInString))
End If
Next positionInString
Debug.Print tempString
' Reset for next iteration
tempString = vbNullString
Next decimalNumber
End Sub
Public Function DecToBin(ByVal decimalPassed As Long, ByVal digitsToCount As Long) As String
Dim curDec As Long
curDec = decimalPassed
Dim strTemp As String
Dim positionInString As Long
positionInString = digitsToCount
Do While positionInString >= 0
If curDec >= (BASE_2 ^ positionInString) Then
strTemp = strTemp & CHR_1
curDec = curDec - (BASE_2 ^ positionInString)
Else
If positionInString <> digitsToCount Then
strTemp = strTemp & CHR_0
End If
End If
positionInString = positionInString - 1
Loop
DecToBin = strTemp
End Function
Public Sub test()
Dim myNow As Date
myNow = now()
CasePermutation "a 1"
Debug.Print now & " " & myNow ' rough but ok for testing with 16 chr input strings
End Sub
2 Answers 2
The main reason for the slowdown is the repeated "Debug.Print" command. Store your results in an array instead, and "print" it to a worksheet. Just by making that change, the time elapsed was reduced to 1.2 seconds on my computer.
There are a few other slowdowns (like using the "Chr$()" function twice every time through the inner loop) but nothing major.
My reworked version processed a 16-character string + printed the results in 0.6 seconds on my computer, approximately twice as fast as the slightly reworked original code. I find it somewhat easier to read (mostly due to avoiding the Byte arrays) but that may just be personal preference. The logic is similar in both.
EDIT: Slightly modified to avoid use of Application.Transpose, which caused errors with>16 characters in string.
Sub casePermutation(ByVal str As String, destWS As Worksheet)
'Loop through characters in string and record whether they are "flippable"
'(ie whether they are a letter from a-z)
ReDim flippable(1 To Len(str), 1 To 2) As Variant
str = LCase(str)
Dim numLetters As Long
Dim i As Long
For i = 1 To Len(str)
flippable(i, 1) = Mid(str, i, 1)
Dim aVal As Long
aVal = Asc(flippable(i, 1))
If aVal >= 97 And aVal <= 122 Then
flippable(i, 2) = True
numLetters = numLetters + 1
Else
flippable(i, 2) = False
End If
Next
'Alert user if character limit has been exceeded
If numLetters > 20 Then
MsgBox "Error: Function only supports up to 20 ""flippable"" letters"
Stop
Exit Sub
End If
'Fill array of permutations
ReDim resultsArr(1 To 2 ^ numLetters, 1 To 1) As String
Dim toRepeat As Long
toRepeat = 1
For i = 1 To Len(str)
Dim uBool As Boolean
Dim repeatCounter As Long
Dim j As Long
uBool = False
repeatCounter = 0
For j = 1 To UBound(resultsArr, 1)
If flippable(i, 2) = True Then
If repeatCounter >= toRepeat Then
uBool = Not uBool
repeatCounter = 0
End If
If uBool = False Then
resultsArr(j, 1) = resultsArr(j, 1) & flippable(i, 1)
Else
resultsArr(j, 1) = resultsArr(j, 1) & UCase(flippable(i, 1))
End If
repeatCounter = repeatCounter + 1
Else
resultsArr(j, 1) = resultsArr(j, 1) & flippable(i, 1)
End If
Next
If flippable(i, 2) = True Then
toRepeat = toRepeat * 2
End If
Next
'Paste results to destination sheet
With destWS
.Range(.Cells(1, 1), .Cells(UBound(resultsArr), 1)) = resultsArr
End With
End Sub
-
\$\begingroup\$ Thanks for that. I'll have a look and get back to you π \$\endgroup\$SlowLearner– SlowLearner2017εΉ΄08ζ31ζ₯ 20:34:05 +00:00Commented Aug 31, 2017 at 20:34
-
\$\begingroup\$ Nice. There is 1 big difference in the logic - your code is processing the permutations concurrently whereas mine is processing the permutations sequentially. This is something that I never anticipated and it is really cool to see how other people approach problems differently; and educational too. Now I see the need to be more specific with stating requirements (and I guess seeking clarification). \$\endgroup\$SlowLearner– SlowLearner2017εΉ΄08ζ31ζ₯ 23:20:44 +00:00Commented Aug 31, 2017 at 23:20
-
\$\begingroup\$ I did notice your code generates Error 13 for strings larger than 16 characters:
casePermutation "abcdefghijklmnop", activeworkbook.Sheets(1)
on this line.Range(.Cells(1, 1), .Cells(UBound(resultsArr), 1)) = Application.Transpose(resultsArr)
. I didn't test the impact of non-flippable characters. \$\endgroup\$SlowLearner– SlowLearner2017εΉ΄08ζ31ζ₯ 23:20:49 +00:00Commented Aug 31, 2017 at 23:20 -
\$\begingroup\$ I'll update my question to provide some more clarity as to why I am looking to generate the permutations sequentially - basically it is so I can exit the code if I find a match with text in another location. E.g.
If EncodePermutation = someString Then Stop
. I think this forces the sequential approach as (unstated in the OP) each permutation is encoded and checked against a string. \$\endgroup\$SlowLearner– SlowLearner2017εΉ΄08ζ31ζ₯ 23:26:44 +00:00Commented Aug 31, 2017 at 23:26 -
1\$\begingroup\$ In case you're interested, see initial guesses below. Many thanks for your help :-D \$\endgroup\$SlowLearner– SlowLearner2017εΉ΄09ζ02ζ₯ 00:40:00 +00:00Commented Sep 2, 2017 at 0:40
About this Answer
Thanks to Daniel McCracken for his thoughts on this. Daniel's answer was a great solution to the original question; hence his is the accepted answer.
This answer suits my needs better but those needs could not have been guessed from the original question. However...
After considering Daniel's proposal I realized that it might be appropriate to make some guesses before launching into what was essentially a Brute Force approach to solving my problem which is essentially to generate all possible Upper/Lower case permutations of a username and compare that to an encrypted string.
Declarations
Option Explicit
Const BASE_2 As Long = 2
Const CHR_1 As String = "1"
Const CHR_0 As String = "0"
Fake MD5
For the purpose of testing I created FakeMD5:
Function fakeMD5(ByVal inputString As String) As String
Dim i As Long
Dim c As String
Dim outputString As String
For i = 1 To Len(inputString)
c = Mid(inputString, i, 1)
If UCase(c) = c Then
outputString = outputString & CHR_1
Else
outputString = outputString & CHR_0
End If
Next i
fakeMD5 = outputString
End Function
FakeMD5 generates a string of '0' and '1' to represent upper and lower case characters in the UserName (e.g. UserName
=> 10001000
).
For some tests Daniel's code blew mine out of the water - I was surprised because he generated string permutations concurrently so all permutations have to be generated before any can be tested. On the other hand my code generates permutations sequentially allowing testing to be done sooner for some permutations.
Initial Tweak to DecToBin
Looking at Daniels code it became obvious that his code was producing strings that would likely match real usernames before my code would. This realization prompted two slight modifications in the method DecToBin (which incidentally mandates a rename):
If curDec >= (BASE_2 ^ positionInString) Then
'strTemp = CHR_1 * strTemp
strTemp = strTemp & CHR_1 ' <= Modified
curDec = curDec - (BASE_2 ^ positionInString)
Else
If positionInString <> digitsToCount Then
'strTemp = CHR_0 & strTemp
strTemp = strTemp & CHR_0 ' <= Modified
End If
End If
This change meant the binary string was reversed and therefore would match a normal UserName sooner (i.e. string is built like this: 10000000, 01000000, 11000000
instead of 00000001, 00000010, 00000011
).
[EDIT: Overhaul of DecToBin] (still needs a name change)
After posting this answer some more thought went into DecToBin... essentially the problem requires adding 1 to a binary number... there is no need to recalculate the binary number so it is declared as Static
.
The DecToBin function was replaced with the following code to yield a better than 30% improvement in overall speed:
Public Function DecToBin(ByVal decimalPassed As Long, ByVal digitsToCount As Long) As String
Static binaryR As String
Static zeros As String
Dim iPos As Long
Dim iZero As Long
Select Case True
Case decimalPassed = 1, 0
binaryR = vbNullString
For iPos = 1 To digitsToCount
binaryR = CHR_0 & binaryR
Next iPos
zeros = binaryR
binaryR = CHR_1 & binaryR
Case Mid$(binaryR, 1, 1) = CHR_0
Mid$(binaryR, 1, 1) = CHR_1
Case InStr(binaryR, CHR_0) = 0
' this is the end
For iPos = 1 To digitsToCount
binaryR = CHR_0 & binaryR
Next iPos
Case Else
iPos = InStr(binaryR, CHR_0)
Mid$(binaryR, iPos, 1) = CHR_1
Dim iCount As Long
For iCount = 1 To iPos
Mid$(binaryR, iCount, 1) = CHR_0
Next iCount
End Select
DecToBin = binaryR
End Function
Note: An attempt was made to eliminate the For
Loop in favor of using Mid$()
(code below was proposed rewrite of Dec2Bin with name change) but it turns out the Mid$()
was slower than using a loop. The test strings for this were all 1 lowercase character followed but 'n' uppercase characters.
Function binaryAddOne() As String ' Expected to be fast, but was slow NOT Used (see results: 'Old Shift')
Static zeros As String
Static binin As String
Dim iPos As Long
iPos = InStr(1, binin, CHR_0)
Select Case True
Case binin = vbNullString
zeros = Replace$(Space$(30), " ", CHR_0)
binin = Replace$(zeros, CHR_0, CHR_1, 1, 1)
Case CBool(iPos)
Mid$(binin, iPos, 1) = CHR_1
Mid$(binin, 1, iPos - 1) = Left$(zeros, iPos - 1)
End Select
binaryAddOne = binin
'Debug.Print binin
End Function
In the image below times are in seconds (calculated (Now(End) - Now(start)) * 10 ^ 5
, "Loop" is the time for the original code (as in OP), and "Shift" is the time for adding +1
using BinaryAddOne() (instead of recalculating the entire binary number). Surprisingly "Old Shift" was the result when using Mid$()
to replace zeros instead of looping through the characters to replace zeros.
Opportunity for CasePermutation
Nothing was changed in the method CasePermutation.
There might be some opportunity to do away with the Byte
array by using the builtin Mid
function to modify strings.
Initial Guesses
The Mid
function is used in the newly added InitialGuesses
method, code:
Private Function InitialGuesses(ByVal stringIn As String, ByVal encrypted As String) As String
Dim tempString As String
Dim i As Long
tempString = LCase(stringIn)
Dim testString As String
' All UpperCase
If fakeMD5(UCase(tempString)) = encrypted Then
InitialGuesses = UCase(tempString)
GoTo foundResult
End If
' Only letter is Cap
For i = 1 To Len(stringIn)
'testString = Left(tempString, i - 1) & UCase(Mid(tempString, i, 1)) & Right(tempString, Len(tempString) - i)
testString = tempString
Mid(testString, i, 1) = UCase(Mid(tempString, i, 1))
If fakeMD5(testString) = encrypted Then
InitialGuesses = UCase(testString)
GoTo foundResult
End If
Next i
' First letter + 1 other letter is Cap
For i = 2 To Len(tempString)
testString = tempString
Mid(testString, 1, 1) = UCase(Mid(tempString, 1, 1))
Mid(testString, i, 1) = UCase(Mid(testString, i, 1))
If fakeMD5(testString) = encrypted Then
InitialGuesses = testString
GoTo foundResult
End If
Next i
' First letter + 2 consecutive letters are Cap
For i = 2 To Len(tempString) - 1
testString = tempString
Mid(testString, 1, 1) = UCase(Mid(tempString, 1, 1))
Mid(testString, i, 2) = UCase(Mid(testString, i, 2))
If fakeMD5(testString) = encrypted Then
InitialGuesses = testString
GoTo foundResult
End If
Next i
' First letter + 2 letters separated by a space (XxxxxXxX) are Cap
For i = 2 To Len(tempString) - 2
testString = tempString
Mid(testString, 1, 1) = UCase(Mid(tempString, 1, 1))
Mid(testString, i, 1) = UCase(Mid(testString, i, 1))
Mid(testString, i + 2, 1) = UCase(Mid(testString, i + 2, 1))
If fakeMD5(testString) = encrypted Then
InitialGuesses = testString
GoTo foundResult
End If
Next i
foundResult:
End Function
Initial guesses is designed to quickly test likely combinations of case permutations based on typical UserNames. The last two guesses were included for people with surnames like OBrien and McCracken (or with initials in their names like VerySLearner).
Testing the Code
The whole thing is driven by this test method:
FakeMD5 creates a string that for all intents and purposes serves as our unique ID for a given user - it is generated from myTestString
and is used to compare each permutation against.
Compare the speed of finding a match for a very long first name using initial guesses and brute force by commenting out the line: stringFound = InitialGuesses(myTestString, encrypted)
(word of warning this could take a while, hold Ctrl+Esc before pressing Ctrl+Break works for me, you could add DoEvents in the main For loop of CasePermutation)
Sub test()
Dim myNow As Date
Dim myNow2 As Date
Const myTestString As String = "Daniel_has_a_longnameMcCracken" ' Change this for testing
Dim encrypted As String
encrypted = fakeMD5(myTestString)
Debug.Print "Encrypted by fakeMD5: " & encrypted & " (" & Len(encrypted) & ")"
Dim stringFound As String
myNow = Now()
Debug.Print "Attempting initial guesses..."
stringFound = InitialGuesses(myTestString, encrypted)
If stringFound <> vbNullString Then
Debug.Print stringFound
GoTo foundID
End If
Debug.Print "Initial guesses failed, attempting brute force..."
stringFound = CasePermutation(myTestString, encrypted)
If stringFound <> vbNullString Then
Debug.Print stringFound
GoTo foundID
End If
foundID:
Debug.Print "v1 " & (Now - myNow) * 10 ^ 5 ' rough estimate of time
Exit Sub
End Sub