Return to Revisions

10 of 10

Commonmark migration

edited Jun 10, 2020 at 13:24

Community Bot

edited Jun 10, 2020 at 13:24

Community Bot

The below points are in no particular order:

Extraction into Methods

The utility and the test should be broken up into methods, i.e., you should have a function String removeDuplicates(String), containing the logic for removing duplicates.

Manual Boxing

Unnecessary manual boxing to Character, just use char and let javac take care of it on its own (JDK1.5+ supports autoboxing of primitives).

Why `LinkedHashSet`?

A HashSet can be used just fine here - the order of the resulting deduplicated String is determined by the order of insertion of characters into the StringBuilder, not the HashSet. (Believe me, I checked. Here you can also: http://ideone.com/K9Ku2p)
Note that the add method of Sets return a boolean, true if the set did not already contain the element and it has been added successfully, or false if the element was already present in the Set. Exploiting this makes the call to Set.contains(...) redundant, see the example code.

Generics

Use generics. Don't use raw collections - they can violate type safety. In your case, you might not realise the immediate benefit of doing so, but it is a good practice when scaling to larger programs. Here, using generics is as simple as changing LinkedHashSet knownChars = new LinkedHashSet(); to LinkedHashSet<Character> knownChars = new LinkedHashSet<>(); (JDK 1.7+ to get the diamond type inference, otherwise it has to be LinkedHashSet<Character> knownChars = new LinkedHashSet<Character>();, JDK 1.5+)

Space-time tradeoffs

To minimize the number of reallocations of the underlying buffers of StringBuilder or HashSet, initialize them with a default capacity of the largest possible size they could have, which is the length of the input String. Use the constructors which have an int capacity parameter. See the example code for details.
To avoid a gotcha involving the load factor (a parameter which decides how full a HashSet should be before it is resized) of the HashSet when initializing the HashSet with capacity in point 6 (the Hashset may be prematurely resized), also set the load factor to 1.0f, using the new HashSet(int capacity, float loadFactor) constructor overload.

Miscellaneous

Type to interfaces, e.g., use Set<Character> knownChars = new LinkedHashSet<>(); instead of LinkedHashSet<Character> knownChars = new LinkedHashSet<>();. This makes your code in general more resilient to refactoring, you can use a different Set implementation at any time by changing one word instead of 2.
Qualify your method parameters with final if you are not going to reassign them in any way - granted, String being immutable makes this redundant, in the sense that any reassignments done to input in removeDuplicates will not affect test in main, but it's a good practice anyway.
Better output messages - see the example code for an example.
Better variable naming - it's already quite good, but try to use full words. See the example code.

Example Code (Ideone):

import java.util.Set;
import java.util.HashSet;
// Store in a file `StringUtilities.java`
public class StringUtilities
{
 public static void main(String[] args)
 {
 String test = "Banana";
 System.out.println("Test string \"" + test + "\" with duplicates removed is: \"" + removeDuplicates(test) + "\"");
 }
 
 public static String removeDuplicates(final String input) {
 Set<Character> knownCharacters = new HashSet<>(input.length(), 1.0f);
 StringBuilder noDuplicates = new StringBuilder(input.length());
 for(char character : input.toCharArray()){
 if(knownCharacters.add(character)){
 noDuplicates.append(character);
 }
 }
 return noDuplicates.toString();
 }
}

answered Jul 1, 2017 at 9:02

Tamoghna Chowdhury

2.3k
10
22

default

Return to Revisions

Extraction into Methods

Manual Boxing

Why LinkedHashSet?

Generics

Space-time tradeoffs

Miscellaneous

Example Code (Ideone):

Why `LinkedHashSet`?