Find all line numbers of prefix duplicates with streams

Question 1

All line indices of lines from a text that begin with the same line prefix should be found. The prefixes and the corresponding line numbers should be returned. Streams and lambdas should be used as techniques if possible.

The format of a line is always as follows:

anyText_anyText_number_number

where anyText_anyText is the prefix, and one line has at least 7 chars.

For example:

a_b_0_1
c_d_5_8
c_d_9_3
b_d_1_12

should return {c_d=[2, 3]}.

The prefixes (here c_d) could be any text (inclusive additional underscores, but without newlines...).

This is my try:

 public static void main(final String[] args) {
 String s = """
 a_b_0_1
 c_d_5_8
 c_d_9_3
 b_d_1_12""";
 Map<String, List<Integer>> duplicates = s.lines()
 .collect(Collectors.groupingBy(l -> l.substring(0, IntStream.range(0, l.length())
 .map(i -> l.length() - i - 1)
 .filter(i -> l.charAt(i) == '_')
 .skip(1)
 .findFirst()
 .orElse(0))))
 .entrySet()
 .stream()
 .filter(e -> e.getValue()
 .size() > 1)
 .collect(Collectors.toMap(Map.Entry::getKey, e -> e.getValue()
 .stream()
 .map(e1 -> {
 LineNumberReader lnr = new LineNumberReader(new StringReader(s));
 return lnr.lines()
 .filter(l -> l.equals(e1))
 .map(l -> lnr.getLineNumber())
 .findFirst()
 .orElse(0);
 })
 .toList()));
 System.out.println(duplicates); // print all duplicates line numbers
 }

Could this code be simplified?

Question 2

Can a prefix be as little as one character?

Question 3

Hello @Reinderien, no, one prefix has at least 3 characters, for example a_b, always followed by an underscore char.

Question 4

First off, the description of your code is quite unclear and leaves things out. In your comment you mention "at least 3 characters followed by an underscore", however that is not what your code does. I'm basing my review on what the code does, not the description.

I think your biggest problem is fixating on using Stream, because Java Stream lacks several features that would be helpful in this scenario, for example a zipWithIndex method and tuples in order to handle more than a single value at a time.

The part that extracts the prefix is especially convoluted. At the very least it should be extracted to its own method. If I understand it correctly, it considers everything before the last two underscores the prefix.

The most straight forward way to this would be the split the string at the underscores, remove the last two items and rejoin the items with underscores. Unfortunately Standard Java lacks methods the remove the last n elements of a Stream or a List. I don't think this can be be done sensibly with Stream. I would do something like this:

private static String prefix(String text) {
 List<String> sections = Arrays.asList(text.split("_"));
 return String.join("_", sections.subList(0, sections.size() - 2));
}

One big problem is that you only take a String with line breaks as an input, due to the convoluted way to find the line numbers with LineNumberReader in the end. An input in form of a List would be more appropriate. For this - and for reviews like this one - it would be helpful to put the central code into a well defined method.

As I suggested above, with a tuple (or alternatively a custom class, or in my case a record) this would be much easier:

public static void main(final String[] args) {
 String s = """
 a_b_0_1
 c_d_5_8
 c_d_9_3
 b_d_1_12""";
 System.out.println(findDuplicatePrefixes(s.lines().toList()));
}
private static record Line(int lineNumber, String text) {}
private static Map<String, List<Integer>> findDuplicatePrefixes(List<String> lines) {
 return IntStream.range(0, lines.size()) 
 .mapToObj(i -> new Line(i + 1, lines.get(i))) 
 .collect(groupingBy( 
 l -> prefix(l.text()), 
 mapping(Line::lineNumber, toList()) 
 )).entrySet() 
 .stream() 
 .filter(e -> e.getValue().size() > 1) 
 .collect(toMap(Entry::getKey, Entry::getValue));
}

Question 5

What information is missing from the question?

Question 6

@TobiasGrothe "Duplicate lines" should have been "duplicate prefixes". As mentioned what exactly a prefix defined as. It probably also would have made sense to explicitly mention that single entries were to be removed.

Question 7

An edit suggestion would be nice to make the task clearer...

Question 8

Ok, I've edited the question.

Question 9

Streams and lambdas should be used as techniques if possible.

But why? Why use hammer when screw driver is so much better in bomb defusing? Look at the stream based solutions and tell me the honest words you uttered when you were told that you were given the responsibility of maintaining them.

I know the society in its later stages of capitalism does not put much weight on human suffering, but I do. I think we should consider solutions that literally hurt people as impossible. So, here is how one would implement the problem with good old loops and conditionals.

final Map<String, List<Integer>> lineNumbers = new HashMap<>();
final Pattern pattern = Pattern.compile("(.*)_\\d+_\\d+");
final String[] lines = """
 a_b_0_1
 c_d_5_8
 c_d_9_3
 b_d_1_12""".split("\n");
// Find line numbers where each prefix occurs.
for (int i = 0; i < lines.length; i++) {
 final Matcher matcher = pattern.matcher(lines[i]);
 if (matcher.matches()) {
 final String prefix = matcher.group(1);
 final List<Integer> numbers = lineNumbers
 .computeIfAbsent(prefix, p -> new ArrayList<>());
 // Line numbers are indexed from 1.
 numbers.add(i + 1);
 }
}
// Remove line numbers that only occur once.
lineNumbers.values().removeIf(numbers -> numbers.size() == 1);
System.out.println(lineNumbers);

Question 10

Thanks, I also think, streams are here not the best choices. But I have a credo in my head that says to use streams wherever possible... Maybe just because it's the new, hip thing. ;)

RoToRa RoToRa 11.6k1 gold badge24 silver badges51 bronze badges · Accepted Answer · 2023-12-22 14:51:34Z

First off, the description of your code is quite unclear and leaves things out. In your comment you mention "at least 3 characters followed by an underscore", however that is not what your code does. I'm basing my review on what the code does, not the description.

I think your biggest problem is fixating on using Stream, because Java Stream lacks several features that would be helpful in this scenario, for example a zipWithIndex method and tuples in order to handle more than a single value at a time.

The part that extracts the prefix is especially convoluted. At the very least it should be extracted to its own method. If I understand it correctly, it considers everything before the last two underscores the prefix.

The most straight forward way to this would be the split the string at the underscores, remove the last two items and rejoin the items with underscores. Unfortunately Standard Java lacks methods the remove the last n elements of a Stream or a List. I don't think this can be be done sensibly with Stream. I would do something like this:

private static String prefix(String text) {
 List<String> sections = Arrays.asList(text.split("_"));
 return String.join("_", sections.subList(0, sections.size() - 2));
}

One big problem is that you only take a String with line breaks as an input, due to the convoluted way to find the line numbers with LineNumberReader in the end. An input in form of a List would be more appropriate. For this - and for reviews like this one - it would be helpful to put the central code into a well defined method.

As I suggested above, with a tuple (or alternatively a custom class, or in my case a record) this would be much easier:

public static void main(final String[] args) {
 String s = """
 a_b_0_1
 c_d_5_8
 c_d_9_3
 b_d_1_12""";
 System.out.println(findDuplicatePrefixes(s.lines().toList()));
}
private static record Line(int lineNumber, String text) {}
private static Map<String, List<Integer>> findDuplicatePrefixes(List<String> lines) {
 return IntStream.range(0, lines.size()) 
 .mapToObj(i -> new Line(i + 1, lines.get(i))) 
 .collect(groupingBy( 
 l -> prefix(l.text()), 
 mapping(Line::lineNumber, toList()) 
 )).entrySet() 
 .stream() 
 .filter(e -> e.getValue().size() > 1) 
 .collect(toMap(Entry::getKey, Entry::getValue));
}

@TobiasGrothe "Duplicate lines" should have been "duplicate prefixes". As mentioned what exactly a prefix defined as. It probably also would have made sense to explicitly mention that single entries were to be removed.
An edit suggestion would be nice to make the task clearer...

Stack Exchange Network

Find all line numbers of prefix duplicates with streams

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find all line numbers of prefix duplicates with streams

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions