Aim: To have only one of each character on the Return string:
Public Shared Function CheckForDuplicates(ByVal vCharCheck As String) As String
Dim vDeDuplicated As String = ""
Dim i As Integer
For i = 1 To vCharCheck.Length ' Count length of string
'vDeDuplicate is blank to start so will always capture first character
'vDeDuplicate is then checked against each character in the incoming string to see if it contains this character already
'thus deduplicting the string
If vDeDuplicated.Contains(Mid(vCharCheck, i, 1)) Then
'Check the new string to see if it exists, if it does it is not allowed
Else
'The character is not found so we add it
vDeDuplicated += Mid(vCharCheck, i, 1)
End If
Next
Return vDeDuplicated
End Function
3 Answers 3
Of course you can eliminate the loop by using the Distinct method of the string, which won't care if the string is empty or not. Assuming vCharCheck is the input string, it would look something like this:
Return New String(vCharCheck.Distinct.ToArray)
-
2\$\begingroup\$ In a similar vein:
String.Concat(vCharCheck.Distinct)
\$\endgroup\$recursive– recursive2014年02月10日 21:20:36 +00:00Commented Feb 10, 2014 at 21:20
you should change the code around a little bit so that if the string is empty it will exit the code before hitting the loop. and you should use a foreach
instead of a for
and get rid of all the Mid stuff
For Each Character As Char In vCharCheck
If Not vDeDuplicated.Contains(Character) Then
vDeDuplicated += Character
End If
Next
(削除) the loop won't run if the Length of vCharCheck
is less than 1 so you don't really have to check that it contains a first letter.
I am sorry it will run, infinitely I think, because it will start at 1
and loop until it reaches 0
and nothing will happen inside the loop (削除ここまで)
-
3\$\begingroup\$ Loop will not run infinitely ... . Before the statement block runs, Visual Basic compares counter to end. If counter is already larger than the end value (or smaller if step is negative), the For loop ends and control passes to the statement that follows the Next statement. \$\endgroup\$rolfl– rolfl2014年02月07日 17:47:58 +00:00Commented Feb 7, 2014 at 17:47
-
\$\begingroup\$ @rolfl, that makes sense. but shouldn't. VB \$\endgroup\$Malachi– Malachi2014年02月07日 17:50:32 +00:00Commented Feb 7, 2014 at 17:50
-
\$\begingroup\$ I think checking the string length before the
foreach
loop adds complexity. Theforeach
inherently checks for an empty string because if the string is empty, it will not execute. By removing theIf
, you have reduced nesting, made the code flow, and made it less confusing. \$\endgroup\$Jeff Vanzella– Jeff Vanzella2014年02月07日 18:07:32 +00:00Commented Feb 7, 2014 at 18:07 -
\$\begingroup\$ @JeffVanzella the whole point of checking it outside is to keep the stuff inside of it from trying to run if it doesn't have to. \$\endgroup\$Malachi– Malachi2014年02月07日 18:11:45 +00:00Commented Feb 7, 2014 at 18:11
-
1\$\begingroup\$ @JeffVanzella, that makes sense and I agree with you. plus I had the wrong variable in that if statement as well \$\endgroup\$Malachi– Malachi2014年02月07日 18:18:41 +00:00Commented Feb 7, 2014 at 18:18
There are also two minor improvements that could increase the performance of your function - although they do add a little bit of complexity to the final code.
Decrease complexity using a structure that allows constant time character lookup.
Your implementation uses the result string (
vDeDuplicated
) as a placeholder for the unique characters. Therefore, for every character c read from the input string, there is a call tovDeDuplicated.Contains(c)
- which executes a linear search onvDeDuplicated
. That could be avoided if we used a hashtable or an array, indexed by the character code itself. We can now check for repeated characters in constant time:'if the string is encoded in ASCII, we only need 128 elements 'to store every existing character. 'For unicode strings, you can change it to 65536 Dim vDeDuplicated(128) As Boolean Dim result As String = "" For Each Character As Char In vCharCheck 'using the character numeric code as an index - constant time lookup If Not vDeDuplicated(AscW(Character)) Then result += Character vDeDuplicated(AscW(Character)) = True End If Next Return result
Use a StringBuilder to generate your result.
Since strings are immutable in VB.NET, using the string concatenation operator (+) will always generate new strings, leading to a significant performance penalty. As a rule of thumb, you should always try to use StringBuilder when the number of concatenations you need to perform is not known at compile time. These are some nice articles on that matter:
-
1\$\begingroup\$ I agree with your second point that we should be using a StriungBuilder, but the first point looks like we are adding complexity that isn't needed here. the other two answers cover this situation much better than adding a new variable array into the mix, it just doesn't seem logical to me. \$\endgroup\$Malachi– Malachi2014年02月11日 16:52:01 +00:00Commented Feb 11, 2014 at 16:52
-
1\$\begingroup\$ Yes, I agree! At least from the point of view of code quality/maintainability - and that's why I added the remark that "they do add a little bit of complexity to the final code.". However, one cannot ignore the fact that we are decreasing complexity from O(n^2) to O(n) by adding a couple of lines (this is actually a pretty well known technique for implementing character lookup efficiently). The OP was not concerned with performance/complexity, but nonetheless I think this is an aspect that could/should be addressed in code review. Accepted answer is definitely much simpler, though. \$\endgroup\$rla4– rla42014年02月11日 17:18:10 +00:00Commented Feb 11, 2014 at 17:18
-
\$\begingroup\$ I will give it the UpVote, I was on the fence and you pushed me off of it onto the upvote side \$\endgroup\$Malachi– Malachi2014年02月11日 17:27:32 +00:00Commented Feb 11, 2014 at 17:27
vCharCheck
determined/populated? where do you assign a value tovDeDuplicated
? \$\endgroup\$