Find common "characters" in 2 given strings (rev1)

Question 1

I would like a review of:

Correctness, especially around edge/corner cases
Efficiency
Readability and/or style
Appropriate use of Java idioms and data structures

In particular:

Have I missed typical idioms/patterns useful to solving the problem?
Could I have reused existing/popular framework/library code?

Less interesting to me:

Feedback on the test harness logic in main() (I could have used JUnit)

A glance at program output from online execution

I have rewritten the code below, including feedback, as Rev2 @ Find common "characters" in 2 given strings (rev2), follow the link to look at the updated version

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.function.Function;
import java.util.stream.Collectors;
class CommonCharacters {
 @SuppressWarnings("boxing")
 private static String commonCharactersOf(String string1, String string2) {
 // Requirement
 //
 // Always return lowercase versions of common characters. e.g.:
 //
 // OK: (a, a) -> a
 // OK: (a, A) -> a
 // OK: (A, A) -> a
 // No: (A, A) -> A
 // No: (aA, aA) -> aA
 //
 // Requirement
 //
 // Return common characters joined in a String, preserving the order in
 // which they appeared in the longest argument, or in the first argument if
 // the arguments are of the same length.
 //
 // Requirement
 //
 // Handle "characters" (i.e. code points) outside the Basic Multilingual
 // Plane (BMP), including characters from Supplementary Planes.
 // There should be no `char' or `Character' based "false positives". i.e.:
 //
 // String string1 = "\uD835\uDC00", string2 = "\uD835\uDC01";
 //
 // string1 and string2 share no characters in the intended acceptation of
 // "character".
 String shortestArgument, longestArgument;
 if (string1.length() < string2.length()) {
 shortestArgument = string1;
 longestArgument = string2;
 } else {
 shortestArgument = string2;
 longestArgument = string1;
 }
 // @formatter:off
 Set<Integer> codePointsSeen =
 shortestArgument.codePoints()
 .boxed()
 .map(Character::toLowerCase)
 .collect(Collectors.toSet());
 List<Integer> codePointsInCommon = new ArrayList<>();
 for (Iterator<Integer> iterator = longestArgument.codePoints()
 .distinct()
 .iterator();
 iterator.hasNext() &&
 codePointsInCommon.size() < codePointsSeen.size();) {
 // @formatter:on
 Integer codePoint = iterator.next();
 int lowerCaseCodePoint = Character.toLowerCase(codePoint);
 if (codePointsSeen.contains(lowerCaseCodePoint)) {
 codePointsInCommon.add(lowerCaseCodePoint);
 }
 }
 StringBuilder stringBuilder = new StringBuilder();
 for (Integer codePoint : codePointsInCommon) {
 stringBuilder.appendCodePoint(codePoint);
 }
 return stringBuilder.toString();
 }
 @SuppressWarnings("boxing")
 public static void main(String[] args) {
 // @formatter:off
 String[][] testArgumentsAndExpectedValues = {
 { "" , "" , "" },
 { "a" , "" , "" },
 { "" , "a" , "" },
 { "aa" , "" , "" },
 { "" , "aa" , "" },
 { "a" , "a" , "a" },
 { "aa" , "b" , "" },
 { "b" , "aa" , "" },
 { "ab" , "ba" , "ab" },
 { "aba" , "ab" , "ab" },
 { "aba" , "ba" , "ab" },
 { "aba" , "aab" , "ab" },
 { "a" , "A" , "a" },
 { "A" , "a" , "a" },
 { "A" , "A" , "a" },
 { "ab" , "AB" , "ab" },
 { "AB" , "ab" , "ab" },
 { "aB" , "Ab" , "ab" },
 { "aB" , "Ba" , "ab" },
 { "aB" , "Ba" , "ab" },
 { "a" , "\uD835\uDC1A" , "" },
 { "\uD835\uDC1A" , "\uD835\uDC1A" , "\uD835\uDC1A" },
 { "\uD835\uDC00" , "\uD835\uDC00" , "\uD835\uDC00" },
 { "\uD835\uDC1A" , "\uD835\uDC00" , "" },
 { "\uD835\uDC00" , "\uD835\uDC01" , "" },
 { "\uD801\uDC2B" , "\uD801\uDC2B" , "\uD801\uDC2B" },
 { "\uD801\uDC03" , "\uD801\uDC03" , "\uD801\uDC2B" },
 { "\uD801\uDC2B" , "\uD801\uDC03" , "\uD801\uDC2B" },
 { "\uD83D\uDE80" , "\uD83D\uDE80" , "\uD83D\uDE80" },
 { "a" , "aaaaaaaaaaaaaaaaa" , "a" },
 // The last test should still work, and work fast, with a second
 // argument string starting with "a" and ending _many_ characters later
 // The last test values doe not test it, but illustrate the scenario
 };
 int maximumTestArgumentLength =
 Arrays.stream(testArgumentsAndExpectedValues)
 .flatMap(testValues -> Arrays.stream(testValues)
 .limit(2))
 .mapToInt(String::length)
 .max()
 .getAsInt();
 // @formatter:on
 int maximumQuotedTestArgumentLength = maximumTestArgumentLength + 2;
 for (int i = 0; i < testArgumentsAndExpectedValues.length; i++) {
 String[] currentTestValues = testArgumentsAndExpectedValues[i];
 String string1 = currentTestValues[0];
 String string2 = currentTestValues[1];
 String expectedResult = currentTestValues[2];
 String actualResult = commonCharactersOf(string1, string2);
 boolean testSuccessful = expectedResult.equals(actualResult);
 if (testSuccessful) {
 // continue; // TODO: uncomment to filter out successful tests
 }
 Function<String, String> quoteString = s -> '"' + s + '"';
 // @formatter:off
 String outputFormat = "%2d) "
 + "%5s: "
 + "(%-" + maximumQuotedTestArgumentLength + "s"
 + " , "
 + "%-" + maximumQuotedTestArgumentLength + "s)"
 + " -> "
 + "%s "
 + "(%s)"
 + "%n";
 System.out.printf(outputFormat,
 i,
 testSuccessful ? "Success" : "Failure",
 quoteString.apply(string1),
 quoteString.apply(string2),
 quoteString.apply(actualResult),
 quoteString.apply(expectedResult));
 // @formatter:on
 }
 }
}

Question 2

What do you mean by "common characters"? Sequence(s) of characters? Duplicates are allowed?

Question 3

No duplicates. (Look at my sample test cases...)

Question 4

Your cases are quite limited. "aabccc" and "aaccc" gives "aaccc" or "ccc"?

Question 5

"aabccc", "aaccc" -> "ac"

Question 6

I like that you use Streams. But I think you should use even more of their functionality:

You could have used map to get the lower case for longestArgument. Or maybe it would have been simpler to just use longestArgument.toLowerCase(), and same for shortestArgument.
You use distinct() to remove duplicate, but you could have gotten rid of the main for-loop entirely by also using filter(...) where the filter uses a predicate built from the values in shortestArgument.
For the final output, you can use codePointStream.toArray() to create an int[] and then use String(codePointArray, 0, codePointArray.length). You could instead use intStream.collect(...) to perform a reduction which would fill and return a StringBuilder, but it would end up quite ugly.

You spend many lines defining the shortest/longest strings, but I don't see how that is relevant. You could just assign any string to any of those and the algorithm would still work.

I might have written the program a bit differently. I would define a function that takes a string and returns the lower case version with no duplicate, in the same order, as an IntStream. I would apply that function at the start on both input strings. And then apply filter on one of them, with the predicate built from the values of the other.

toto2 toto2 5,45017 silver badges21 bronze badges · Accepted Answer · 2015-10-03 23:59:06Z

I like that you use Streams. But I think you should use even more of their functionality:

You could have used map to get the lower case for longestArgument. Or maybe it would have been simpler to just use longestArgument.toLowerCase(), and same for shortestArgument.
You use distinct() to remove duplicate, but you could have gotten rid of the main for-loop entirely by also using filter(...) where the filter uses a predicate built from the values in shortestArgument.
For the final output, you can use codePointStream.toArray() to create an int[] and then use String(codePointArray, 0, codePointArray.length). You could instead use intStream.collect(...) to perform a reduction which would fill and return a StringBuilder, but it would end up quite ugly.

You spend many lines defining the shortest/longest strings, but I don't see how that is relevant. You could just assign any string to any of those and the algorithm would still work.

I might have written the program a bit differently. I would define a function that takes a string and returns the lower case version with no duplicate, in the same order, as an IntStream. I would apply that function at the start on both input strings. And then apply filter on one of them, with the predicate built from the values of the other.

Stack Exchange Network

Find common "characters" in 2 given strings (rev1)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Find common "characters" in 2 given strings (rev1)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions