I have retackled this problem using the help I received from here: Remove duplications from a Java String but this time using a LinkedHashSet
since previosuly I was using a HashSet
but the answer was out of order.
From this implementation my run time should be \$O(n)\$ correct? Does anyone see any room where I can improve my code or some mistakes?
public static void main(String[] args){
String test = "Banana";
LinkedHashSet knownChars = new LinkedHashSet();
StringBuilder noDups = new StringBuilder();
for(Character c : test.toCharArray()){
if(!knownChars.contains(c)){
knownChars.add(c);
noDups.append(c);
}
}
System.out.println("No duplicate string is: " + noDups);
}
-
\$\begingroup\$ This approach works, but why are you using a LinkedHashSet? Just use a HashSet and it should work fine as well. \$\endgroup\$ljeabmreosn– ljeabmreosn2017年07月01日 06:58:26 +00:00Commented Jul 1, 2017 at 6:58
-
\$\begingroup\$ I was previously using a HashSet but, the results would be printed out of order. For example, given the string , "Banana" the result would be "aBn" still correct but not in order. \$\endgroup\$TheLearner– TheLearner2017年07月01日 07:15:17 +00:00Commented Jul 1, 2017 at 7:15
-
\$\begingroup\$ You can just say O(n), no need to say BigO everytime. \$\endgroup\$yuri– yuri2017年07月01日 07:43:26 +00:00Commented Jul 1, 2017 at 7:43
-
1\$\begingroup\$ You should tell that your code is based on this answer to your previous question. \$\endgroup\$Martin R– Martin R2017年07月01日 15:56:45 +00:00Commented Jul 1, 2017 at 15:56
1 Answer 1
The below points are in no particular order:
Extraction into Methods
- The utility and the test should be broken up into methods, i.e., you should have a function
String removeDuplicates(String)
, containing the logic for removing duplicates.
Manual Boxing
- Unnecessary manual boxing to
Character
, just usechar
and letjavac
take care of it on its own (JDK1.5+ supports autoboxing of primitives).
Why LinkedHashSet
?
A
HashSet
can be used just fine here - the order of the resulting deduplicatedString
is determined by the order of insertion of characters into theStringBuilder
, not theHashSet
. (Believe me, I checked. Here you can also: http://ideone.com/K9Ku2p)Note that the
add
method ofSet
s return aboolean
,true
if the set did not already contain the element and it has been added successfully, orfalse
if the element was already present in theSet
. Exploiting this makes the call toSet.contains(...)
redundant, see the example code.
Generics
- Use generics. Don't use raw collections - they can violate type safety. In your case, you might not realise the immediate benefit of doing so, but it is a good practice when scaling to larger programs. Here, using generics is as simple as changing
LinkedHashSet knownChars = new LinkedHashSet();
toLinkedHashSet<Character> knownChars = new LinkedHashSet<>();
(JDK 1.7+ to get the diamond type inference, otherwise it has to beLinkedHashSet<Character> knownChars = new LinkedHashSet<Character>();
, JDK 1.5+)
Space-time tradeoffs
To minimize the number of reallocations of the underlying buffers of
StringBuilder
orHashSet
, initialize them with a default capacity of the largest possible size they could have, which is the length of the inputString
. Use the constructors which have anint capacity
parameter. See the example code for details.To avoid a
gotcha
involving the load factor (a parameter which decides how full aHashSet
should be before it is resized) of theHashSet
when initializing theHashSet
with capacity in point 6 (theHashset
may be prematurely resized), also set the load factor to1.0f
, using thenew HashSet(int capacity, float loadFactor)
constructor overload.
Miscellaneous
Type to interfaces, e.g., use
Set<Character> knownChars = new LinkedHashSet<>();
instead ofLinkedHashSet<Character> knownChars = new LinkedHashSet<>();
. This makes your code in general more resilient to refactoring, you can use a differentSet
implementation at any time by changing one word instead of 2.Qualify your method parameters with
final
if you are not going to reassign them in any way - granted,String
being immutable makes this redundant, in the sense that any reassignments done toinput
inremoveDuplicates
will not affecttest
inmain
, but it's a good practice anyway.Better output messages - see the example code for an example.
Better variable naming - it's already quite good, but try to use full words. See the example code.
Example Code (Ideone):
import java.util.Set;
import java.util.HashSet;
// Store in a file `StringUtilities.java`
public class StringUtilities
{
public static void main(String[] args)
{
String test = "Banana";
System.out.println("Test string \"" + test + "\" with duplicates removed is: \"" + removeDuplicates(test) + "\"");
}
public static String removeDuplicates(final String input) {
Set<Character> knownCharacters = new HashSet<>(input.length(), 1.0f);
StringBuilder noDuplicates = new StringBuilder(input.length());
for(char character : input.toCharArray()){
if(knownCharacters.add(character)){
noDuplicates.append(character);
}
}
return noDuplicates.toString();
}
}
-
\$\begingroup\$ @mdfst13, Thanks for catching that terminology issue there between functions and methods, guess I've been doing too much Scala recently. I normally consider static methods not accessing state to be functions in Java, but I agree that using methods everywhere is more consistent. Also, by that redundancy comment I meant that since String is immutable, the caller's copy of whatever was passed to
input
wouldn't be altered either way - I'll edit to make it clearer. \$\endgroup\$Tamoghna Chowdhury– Tamoghna Chowdhury2017年07月01日 11:38:12 +00:00Commented Jul 1, 2017 at 11:38 -
\$\begingroup\$ The maximal capacity of the set should be larger (length * 4 / 3) if resizing should be avoided as the parameter represents the internal array size, not the resize threshold. The
if (!knownCharacters.contains(character))
is redundant,if (!knownCharacters.add(character)) noDuplicates.append(character);
would be sufficient. \$\endgroup\$Nevay– Nevay2017年07月01日 13:19:36 +00:00Commented Jul 1, 2017 at 13:19 -
\$\begingroup\$ @Nevay, about the 2nd point, I thought so too, but for whatever reason that way returns wrong results - check it yourself (
"ana"
with justadd
,"Ban"
withadd
andcontains
). \$\endgroup\$Tamoghna Chowdhury– Tamoghna Chowdhury2017年07月01日 14:34:05 +00:00Commented Jul 1, 2017 at 14:34 -
\$\begingroup\$ Sorry, typo in my last comment, it should be
if (knownCharacters.add(character)) ...
. \$\endgroup\$Nevay– Nevay2017年07月01日 14:42:23 +00:00Commented Jul 1, 2017 at 14:42 -
\$\begingroup\$ @TamoghnaChowdhury thank you for you insightful response. I did not notice how much I could improve this code. One of the eye openers was the generics and specifying the size of the stringbuilder. \$\endgroup\$TheLearner– TheLearner2017年07月01日 23:14:14 +00:00Commented Jul 1, 2017 at 23:14