All line indices of lines from a text that begin with the same line prefix should be found. The prefixes and the corresponding line numbers should be returned. Streams and lambdas should be used as techniques if possible.
The format of a line is always as follows:
anyText_anyText_number_number
where anyText_anyText
is the prefix, and one line has at least 7 chars.
For example:
a_b_0_1
c_d_5_8
c_d_9_3
b_d_1_12
should return {c_d=[2, 3]}
.
The prefixes (here c_d
) could be any text (inclusive additional underscores, but without newlines...).
This is my try:
public static void main(final String[] args) {
String s = """
a_b_0_1
c_d_5_8
c_d_9_3
b_d_1_12""";
Map<String, List<Integer>> duplicates = s.lines()
.collect(Collectors.groupingBy(l -> l.substring(0, IntStream.range(0, l.length())
.map(i -> l.length() - i - 1)
.filter(i -> l.charAt(i) == '_')
.skip(1)
.findFirst()
.orElse(0))))
.entrySet()
.stream()
.filter(e -> e.getValue()
.size() > 1)
.collect(Collectors.toMap(Map.Entry::getKey, e -> e.getValue()
.stream()
.map(e1 -> {
LineNumberReader lnr = new LineNumberReader(new StringReader(s));
return lnr.lines()
.filter(l -> l.equals(e1))
.map(l -> lnr.getLineNumber())
.findFirst()
.orElse(0);
})
.toList()));
System.out.println(duplicates); // print all duplicates line numbers
}
Could this code be simplified?
2 Answers 2
First off, the description of your code is quite unclear and leaves things out. In your comment you mention "at least 3 characters followed by an underscore", however that is not what your code does. I'm basing my review on what the code does, not the description.
I think your biggest problem is fixating on using Stream
, because Java Stream
lacks several features that would be helpful in this scenario, for example a zipWithIndex
method and tuples in order to handle more than a single value at a time.
The part that extracts the prefix is especially convoluted. At the very least it should be extracted to its own method. If I understand it correctly, it considers everything before the last two underscores the prefix.
The most straight forward way to this would be the split the string at the underscores, remove the last two items and rejoin the items with underscores. Unfortunately Standard Java lacks methods the remove the last n
elements of a Stream
or a List
. I don't think this can be be done sensibly with Stream
. I would do something like this:
private static String prefix(String text) {
List<String> sections = Arrays.asList(text.split("_"));
return String.join("_", sections.subList(0, sections.size() - 2));
}
One big problem is that you only take a String
with line breaks as an input, due to the convoluted way to find the line numbers with LineNumberReader
in the end. An input in form of a List
would be more appropriate. For this - and for reviews like this one - it would be helpful to put the central code into a well defined method.
As I suggested above, with a tuple (or alternatively a custom class, or in my case a record
) this would be much easier:
public static void main(final String[] args) {
String s = """
a_b_0_1
c_d_5_8
c_d_9_3
b_d_1_12""";
System.out.println(findDuplicatePrefixes(s.lines().toList()));
}
private static record Line(int lineNumber, String text) {}
private static Map<String, List<Integer>> findDuplicatePrefixes(List<String> lines) {
return IntStream.range(0, lines.size())
.mapToObj(i -> new Line(i + 1, lines.get(i)))
.collect(groupingBy(
l -> prefix(l.text()),
mapping(Line::lineNumber, toList())
)).entrySet()
.stream()
.filter(e -> e.getValue().size() > 1)
.collect(toMap(Entry::getKey, Entry::getValue));
}
-
\$\begingroup\$ What information is missing from the question? \$\endgroup\$Tobias Grothe– Tobias Grothe2023年12月22日 20:28:59 +00:00Commented Dec 22, 2023 at 20:28
-
\$\begingroup\$ @TobiasGrothe "Duplicate lines" should have been "duplicate prefixes". As mentioned what exactly a prefix defined as. It probably also would have made sense to explicitly mention that single entries were to be removed. \$\endgroup\$RoToRa– RoToRa2023年12月23日 09:14:26 +00:00Commented Dec 23, 2023 at 9:14
-
\$\begingroup\$ An edit suggestion would be nice to make the task clearer... \$\endgroup\$Tobias Grothe– Tobias Grothe2023年12月23日 14:38:08 +00:00Commented Dec 23, 2023 at 14:38
-
\$\begingroup\$ Ok, I've edited the question. \$\endgroup\$Tobias Grothe– Tobias Grothe2023年12月23日 14:57:00 +00:00Commented Dec 23, 2023 at 14:57
Streams and lambdas should be used as techniques if possible.
But why? Why use hammer when screw driver is so much better in bomb defusing? Look at the stream based solutions and tell me the honest words you uttered when you were told that you were given the responsibility of maintaining them.
I know the society in its later stages of capitalism does not put much weight on human suffering, but I do. I think we should consider solutions that literally hurt people as impossible. So, here is how one would implement the problem with good old loops and conditionals.
final Map<String, List<Integer>> lineNumbers = new HashMap<>();
final Pattern pattern = Pattern.compile("(.*)_\\d+_\\d+");
final String[] lines = """
a_b_0_1
c_d_5_8
c_d_9_3
b_d_1_12""".split("\n");
// Find line numbers where each prefix occurs.
for (int i = 0; i < lines.length; i++) {
final Matcher matcher = pattern.matcher(lines[i]);
if (matcher.matches()) {
final String prefix = matcher.group(1);
final List<Integer> numbers = lineNumbers
.computeIfAbsent(prefix, p -> new ArrayList<>());
// Line numbers are indexed from 1.
numbers.add(i + 1);
}
}
// Remove line numbers that only occur once.
lineNumbers.values().removeIf(numbers -> numbers.size() == 1);
System.out.println(lineNumbers);
-
\$\begingroup\$ Thanks, I also think, streams are here not the best choices. But I have a credo in my head that says to use streams wherever possible... Maybe just because it's the new, hip thing. ;) \$\endgroup\$Tobias Grothe– Tobias Grothe2024年01月09日 11:13:26 +00:00Commented Jan 9, 2024 at 11:13
a_b
, always followed by an underscore char. \$\endgroup\$