I would like a review of:
- Correctness, especially around edge/corner cases
- Efficiency
- Readability and/or style
- Appropriate use of Java idioms and data structures
In particular:
- Have I missed typical idioms/patterns useful to solving the problem?
- Could I have reused existing/popular framework/library code?
Less interesting to me:
- Feedback on the test harness logic in
main()
(I could have used JUnit)
A glance at program output from online execution
I have rewritten the code below, including feedback, as Rev2 @ Find common "characters" in 2 given strings (rev2), follow the link to look at the updated version
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.function.Function;
import java.util.stream.Collectors;
class CommonCharacters {
@SuppressWarnings("boxing")
private static String commonCharactersOf(String string1, String string2) {
// Requirement
//
// Always return lowercase versions of common characters. e.g.:
//
// OK: (a, a) -> a
// OK: (a, A) -> a
// OK: (A, A) -> a
// No: (A, A) -> A
// No: (aA, aA) -> aA
//
// Requirement
//
// Return common characters joined in a String, preserving the order in
// which they appeared in the longest argument, or in the first argument if
// the arguments are of the same length.
//
// Requirement
//
// Handle "characters" (i.e. code points) outside the Basic Multilingual
// Plane (BMP), including characters from Supplementary Planes.
// There should be no `char' or `Character' based "false positives". i.e.:
//
// String string1 = "\uD835\uDC00", string2 = "\uD835\uDC01";
//
// string1 and string2 share no characters in the intended acceptation of
// "character".
String shortestArgument, longestArgument;
if (string1.length() < string2.length()) {
shortestArgument = string1;
longestArgument = string2;
} else {
shortestArgument = string2;
longestArgument = string1;
}
// @formatter:off
Set<Integer> codePointsSeen =
shortestArgument.codePoints()
.boxed()
.map(Character::toLowerCase)
.collect(Collectors.toSet());
List<Integer> codePointsInCommon = new ArrayList<>();
for (Iterator<Integer> iterator = longestArgument.codePoints()
.distinct()
.iterator();
iterator.hasNext() &&
codePointsInCommon.size() < codePointsSeen.size();) {
// @formatter:on
Integer codePoint = iterator.next();
int lowerCaseCodePoint = Character.toLowerCase(codePoint);
if (codePointsSeen.contains(lowerCaseCodePoint)) {
codePointsInCommon.add(lowerCaseCodePoint);
}
}
StringBuilder stringBuilder = new StringBuilder();
for (Integer codePoint : codePointsInCommon) {
stringBuilder.appendCodePoint(codePoint);
}
return stringBuilder.toString();
}
@SuppressWarnings("boxing")
public static void main(String[] args) {
// @formatter:off
String[][] testArgumentsAndExpectedValues = {
{ "" , "" , "" },
{ "a" , "" , "" },
{ "" , "a" , "" },
{ "aa" , "" , "" },
{ "" , "aa" , "" },
{ "a" , "a" , "a" },
{ "aa" , "b" , "" },
{ "b" , "aa" , "" },
{ "ab" , "ba" , "ab" },
{ "aba" , "ab" , "ab" },
{ "aba" , "ba" , "ab" },
{ "aba" , "aab" , "ab" },
{ "a" , "A" , "a" },
{ "A" , "a" , "a" },
{ "A" , "A" , "a" },
{ "ab" , "AB" , "ab" },
{ "AB" , "ab" , "ab" },
{ "aB" , "Ab" , "ab" },
{ "aB" , "Ba" , "ab" },
{ "aB" , "Ba" , "ab" },
{ "a" , "\uD835\uDC1A" , "" },
{ "\uD835\uDC1A" , "\uD835\uDC1A" , "\uD835\uDC1A" },
{ "\uD835\uDC00" , "\uD835\uDC00" , "\uD835\uDC00" },
{ "\uD835\uDC1A" , "\uD835\uDC00" , "" },
{ "\uD835\uDC00" , "\uD835\uDC01" , "" },
{ "\uD801\uDC2B" , "\uD801\uDC2B" , "\uD801\uDC2B" },
{ "\uD801\uDC03" , "\uD801\uDC03" , "\uD801\uDC2B" },
{ "\uD801\uDC2B" , "\uD801\uDC03" , "\uD801\uDC2B" },
{ "\uD83D\uDE80" , "\uD83D\uDE80" , "\uD83D\uDE80" },
{ "a" , "aaaaaaaaaaaaaaaaa" , "a" },
// The last test should still work, and work fast, with a second
// argument string starting with "a" and ending _many_ characters later
// The last test values doe not test it, but illustrate the scenario
};
int maximumTestArgumentLength =
Arrays.stream(testArgumentsAndExpectedValues)
.flatMap(testValues -> Arrays.stream(testValues)
.limit(2))
.mapToInt(String::length)
.max()
.getAsInt();
// @formatter:on
int maximumQuotedTestArgumentLength = maximumTestArgumentLength + 2;
for (int i = 0; i < testArgumentsAndExpectedValues.length; i++) {
String[] currentTestValues = testArgumentsAndExpectedValues[i];
String string1 = currentTestValues[0];
String string2 = currentTestValues[1];
String expectedResult = currentTestValues[2];
String actualResult = commonCharactersOf(string1, string2);
boolean testSuccessful = expectedResult.equals(actualResult);
if (testSuccessful) {
// continue; // TODO: uncomment to filter out successful tests
}
Function<String, String> quoteString = s -> '"' + s + '"';
// @formatter:off
String outputFormat = "%2d) "
+ "%5s: "
+ "(%-" + maximumQuotedTestArgumentLength + "s"
+ " , "
+ "%-" + maximumQuotedTestArgumentLength + "s)"
+ " -> "
+ "%s "
+ "(%s)"
+ "%n";
System.out.printf(outputFormat,
i,
testSuccessful ? "Success" : "Failure",
quoteString.apply(string1),
quoteString.apply(string2),
quoteString.apply(actualResult),
quoteString.apply(expectedResult));
// @formatter:on
}
}
}
-
\$\begingroup\$ What do you mean by "common characters"? Sequence(s) of characters? Duplicates are allowed? \$\endgroup\$toto2– toto22015年10月03日 20:10:04 +00:00Commented Oct 3, 2015 at 20:10
-
\$\begingroup\$ No duplicates. (Look at my sample test cases...) \$\endgroup\$Maroloccio– Maroloccio2015年10月03日 20:12:01 +00:00Commented Oct 3, 2015 at 20:12
-
1\$\begingroup\$ Your cases are quite limited. "aabccc" and "aaccc" gives "aaccc" or "ccc"? \$\endgroup\$toto2– toto22015年10月03日 21:13:55 +00:00Commented Oct 3, 2015 at 21:13
-
\$\begingroup\$ "aabccc", "aaccc" -> "ac" \$\endgroup\$Maroloccio– Maroloccio2015年10月03日 21:50:16 +00:00Commented Oct 3, 2015 at 21:50
1 Answer 1
I like that you use Streams. But I think you should use even more of their functionality:
You could have used
map
to get the lower case forlongestArgument
. Or maybe it would have been simpler to just uselongestArgument.toLowerCase()
, and same forshortestArgument
.You use
distinct()
to remove duplicate, but you could have gotten rid of the main for-loop entirely by also usingfilter(...)
where the filter uses a predicate built from the values inshortestArgument
.For the final output, you can use
codePointStream.toArray()
to create anint[]
and then useString(codePointArray, 0, codePointArray.length)
. You could instead useintStream.collect(...)
to perform a reduction which would fill and return aStringBuilder
, but it would end up quite ugly.
You spend many lines defining the shortest/longest strings, but I don't see how that is relevant. You could just assign any string to any of those and the algorithm would still work.
I might have written the program a bit differently. I would define a function that takes a string and returns the lower case version with no duplicate, in the same order, as an IntStream
. I would apply that function at the start on both input strings. And then apply filter
on one of them, with the predicate built from the values of the other.