The below points are in no particular order:
Extraction into Methods
- The utility and the test should be broken up into methods, i.e., you should have a function
String removeDuplicates(String)
, containing the logic for removing duplicates.
Manual Boxing
- Unnecessary manual boxing to
Character
, just usechar
and letjavac
take care of it on its own (JDK1.5+ supports autoboxing of primitives).
Why LinkedHashSet
?
A
HashSet
can be used just fine here - the order of the resulting deduplicatedString
is determined by the order of insertion of characters into theStringBuilder
, not theHashSet
. (Believe me, I checked. Here you can also: http://ideone.com/K9Ku2p)Note that the
add
method ofSet
s return aboolean
,true
if the set did not already contain the element and it has been added successfully, orfalse
if the element was already present in theSet
. Exploiting this makes the call toSet.contains(...)
redundant, see the example code.
Generics
- Use generics. Don't use raw collections - they can violate type safety. In your case, you might not realise the immediate benefit of doing so, but it is a good practice when scaling to larger programs. Here, using generics is as simple as changing
LinkedHashSet knownChars = new LinkedHashSet();
toLinkedHashSet<Character> knownChars = new LinkedHashSet<>();
(JDK 1.7+ to get the diamond type inference, otherwise it has to beLinkedHashSet<Character> knownChars = new LinkedHashSet<Character>();
, JDK 1.5+)
Space-time tradeoffs
To minimize the number of reallocations of the underlying buffers of
StringBuilder
orHashSet
, initialize them with a default capacity of the largest possible size they could have, which is the length of the inputString
. Use the constructors which have anint capacity
parameter. See the example code for details.To avoid a
gotcha
involving the load factor (a parameter which decides how full aHashSet
should be before it is resized) of theHashSet
when initializing theHashSet
with capacity in point 6 (theHashset
may be prematurely resized), also set the load factor to1.0f
, using thenew HashSet(int capacity, float loadFactor)
constructor overload.
Miscellaneous
Type to interfaces, e.g., use
Set<Character> knownChars = new LinkedHashSet<>();
instead ofLinkedHashSet<Character> knownChars = new LinkedHashSet<>();
. This makes your code in general more resilient to refactoring, you can use a differentSet
implementation at any time by changing one word instead of 2.Qualify your method parameters with
final
if you are not going to reassign them in any way - granted,String
being immutable makes this redundant, in the sense that any reassignments done toinput
inremoveDuplicates
will not affecttest
inmain
, but it's a good practice anyway.Better output messages - see the example code for an example.
Better variable naming - it's already quite good, but try to use full words. See the example code.
Example Code (Ideone):
import java.util.Set;
import java.util.HashSet;
// Store in a file `StringUtilities.java`
public class StringUtilities
{
public static void main(String[] args)
{
String test = "Banana";
System.out.println("Test string \"" + test + "\" with duplicates removed is: \"" + removeDuplicates(test) + "\"");
}
public static String removeDuplicates(final String input) {
Set<Character> knownCharacters = new HashSet<>(input.length(), 1.0f);
StringBuilder noDuplicates = new StringBuilder(input.length());
for(char character : input.toCharArray()){
if(knownCharacters.add(character)){
noDuplicates.append(character);
}
}
return noDuplicates.toString();
}
}
- 2.3k
- 10
- 22