Please take a look at my code for (simply) formatting/wrapping a text. Is there something to improve?
Here are the properties the method should have: (updated)
- Every space character (i.e.
" "
or"\n"
, etc.) that occurs at least once should be replaced by exactly one space character. - A line of text should not be longer than
maxLen
characters. - Long words at the end of a line should be inserted into the next line ((soft) word wrap).
- The
'\n'
character should be used as the line separator. - Lines should not be filled up entirely with spaces.
package text;
public class Util {
public static String formatText(String string, final int maxLen, final boolean softWordWrap) {
if (string == null) {
throw new IllegalArgumentException("string is null");
}
if (maxLen <= 0) {
throw new IllegalArgumentException("maxLen must be > 0");
}
final StringBuilder result = new StringBuilder();
final char ls = '\n';
final String text = string.replaceAll("\\s+", " ");
final String[] words = text.split(" ");
int wordIndex = 0;
int lineIndex = 0;
while (wordIndex < words.length) {
final String word = words[wordIndex++];
final int before = result.length();
if (lineIndex + word.length() <= maxLen) {
// word will fit
result.append(word);
} else if (softWordWrap && word.length() <= maxLen) {
// do not split word
--wordIndex;
int len = maxLen - lineIndex;
lineIndex += len;
} else {
// split word (even if softWordWrap == true)
int splitIndex = maxLen - lineIndex;
String newWord1 = word.substring(0, splitIndex);
String newWord2 = word.substring(splitIndex);
result.append(newWord1);
--wordIndex;
words[wordIndex] = newWord2;
}
if (wordIndex >= words.length) {
break;
}
lineIndex += result.length() - before;
if (lineIndex < maxLen) {
result.append(' ');
++lineIndex;
}
if (lineIndex >= maxLen) {
result.append(ls);
lineIndex = 0;
}
}
return result.toString();
}
}
2 Answers 2
Advice I - Package your classes well
I suggest you write:
package com.tobias.grothe.text.util;
Advice II - Make Utils.java
non-constructible
I would hide the constructor of Utils
since it makes no sense to construct it:
private Util() {
}
Advice III - Can add final
to String string
You have:
public static String formatText(String string, final int maxLen, final boolean softWordWrap) {
Why not:
public static String formatText(final String string, final int maxLen, final boolean softWordWrap) {
^^^^^
Advice IV You have:
final StringBuilder result = new StringBuilder();
I would prefer sb
insteaqd of result
. If you want to be verbose, I suggest stringBuilder
.
Advice V - How about trimming the input string
?
You have:
final String text = string.replaceAll("\\s+", " ");
Could you consider rather:
final String text = string.trim().replaceAll("\\s+", " ");
^^^^^^
Otherwise, your have some rather solid piece of code, well done!
Summa summarum
All in all, I had this in mind:
package com.tobias.grothe.text.util;
import java.util.Objects;
public final class Util {
private static final char NL = '\n';
private Util() {
}
public static String formatText(final String string,
final int maximumLineLength,
final boolean softWordWrap) {
Objects.requireNonNull(string, "The input string is null.");
if (maximumLineLength <= 0) {
throw new IllegalArgumentException(
String.format(
"maximumLengthLength(%d) is too small. " +
"Must be at least 1.",
maximumLineLength));
}
final StringBuilder sb = new StringBuilder();
final String text = string.trim().replaceAll("\\s+", " ");
final String[] words = text.split(" ");
int wordIndex = 0;
int lineIndex = 0;
while (wordIndex < words.length) {
final String word = words[wordIndex++];
final int before = sb.length();
if (lineIndex + word.length() <= maximumLineLength) {
// word will fit
sb.append(word);
} else if (softWordWrap && word.length() <= maximumLineLength) {
// do not split word
--wordIndex;
int len = maximumLineLength - lineIndex;
lineIndex += len;
} else {
// split word (even if softWordWrap == true)
int splitIndex = maximumLineLength - lineIndex;
String newWord1 = word.substring(0, splitIndex);
String newWord2 = word.substring(splitIndex);
sb.append(newWord1);
--wordIndex;
words[wordIndex] = newWord2;
}
if (wordIndex >= words.length) {
break;
}
lineIndex += sb.length() - before;
if (lineIndex < maximumLineLength) {
sb.append(' ');
++lineIndex;
}
if (lineIndex >= maximumLineLength) {
sb.append(NL);
lineIndex = 0;
}
}
return sb.toString();
}
public static void main(final String[] args) {
System.out.println(
formatText(
" fdsfds ",
10,
false));
System.out.println();
System.out.println(
formatText(
"He enjoyed programming while having tea.",
10,
true));
}
}
The requirement
Every space character (
'\n'
) [...] should be replaced by exactly one space character
is strange and non-intuitive. Perhaps it suits your business use case, but in my imagination a general solution would preserve newlines so that you can write and format paragraphs. Your current solution does not allow for paragraphs at all. In my demonstration I keep the original behaviour, though.
Your code has another quirk - it leaves trailing spaces at the end of lines, and probably shouldn't.
formatText
being a single method is not very Java-idiomatic, and especially lumping a collection of utility methods into a Util
class is a common anti-pattern. Spin up a simple, non-mutating class with methods.
You use a StringBuilder
(good, roughly); but then you still have to juggle substrings and indices. In my opinion, a more flexible and easier-to-understand method would use regexes and a scanner. This has a number of benefits, including that your input doesn't actually need to all be in memory at once.
Add unit tests - they are critical for a utility like this.
Don't assume that '\n'
is the correct newline. Java understands how to use OS-independent newlines.
Increase flexibility by allowing the user to access a line stream instead of forcing a single string. Among other applications, this could be used in a user interface where lines need to be individually rendered in an iterative manner.
Suggested
Formatter.java
import java.util.Scanner;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class Formatter {
public final int maxLen;
public final boolean softWordWrap;
public final String linesep;
public Formatter(int maxLen, boolean softWordWrap) {
this(maxLen, softWordWrap, System.lineSeparator());
}
public Formatter(int maxLen, boolean softWordWrap, String linesep) {
if (maxLen <= 0) {
throw new IllegalArgumentException("maxLen must be > 0");
}
this.maxLen = maxLen;
this.softWordWrap = softWordWrap;
this.linesep = linesep;
}
private abstract class LineSplitter implements Spliterator<String> {
private final Scanner scanner;
private int linesRemain;
private final Pattern toLimit;
protected LineSplitter(Scanner scanner, int linesEst, Pattern toLimit) {
this.scanner = scanner;
this.toLimit = toLimit;
linesRemain = linesEst;
}
@Override
public long estimateSize() { return linesRemain; }
@Override
public int characteristics() { return ORDERED | NONNULL; }
@Override
public Spliterator<String> trySplit() { return null; }
@Override
public boolean tryAdvance(Consumer<? super String> action) {
scanner.skip("\\s*");
String read = scanner.findWithinHorizon(toLimit, maxLen);
if (read == null)
return false;
action.accept(read);
linesRemain = Integer.max(1, linesRemain - 1);
return true;
}
}
private class SoftLineSplitter extends LineSplitter {
public SoftLineSplitter(Scanner scanner, int linesEst) {
super(
scanner, linesEst,
Pattern.compile("""
(
\\S{%d} # Word that is so long it needs to be hard-wrapped
)|(
.{0,%d} # Any characters, greedy
\\S # Last character is non-space
(?=\\s|$) # Terminating space or end, zero-width lookahead
)
""".formatted(maxLen, maxLen - 1), Pattern.COMMENTS)
);
}
}
private class HardLineSplitter extends LineSplitter {
public HardLineSplitter(Scanner scanner, int linesEst) {
super(
scanner, linesEst,
// Simpler pattern: any character greedy up to last possible non-space
Pattern.compile(
".{0,%d}\\S".formatted(maxLen - 1)
)
);
}
}
public Stream<String> getLines(String string) {
if (string == null) {
throw new IllegalArgumentException("string is null");
}
int linesEst = string.length() / maxLen;
Scanner scanner = new Scanner(string);
Spliterator<String> splitter;
if (softWordWrap)
splitter = new SoftLineSplitter(scanner, linesEst);
else
splitter = new HardLineSplitter(scanner, linesEst);
return StreamSupport.stream(splitter, false);
}
public String format(String string) {
return getLines(string)
.collect(Collectors.joining(linesep));
}
}
FormatterTests.java
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class FormatterTests {
// The tests cannot assume the value for environmental newline, so this is specified explicitly as '\n'.
@Test
void with_space_hard() {
assertEquals(
"on\ne\ntw\no",
new Formatter(2, false, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(3, false, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(4, false, "\n").format("one two")
);
assertEquals(
"one t\nwo",
new Formatter(5, false, "\n").format("one two")
);
assertEquals(
"one tw\no",
new Formatter(6, false, "\n").format("one two")
);
assertEquals(
"one two",
new Formatter(7, false, "\n").format("one two")
);
}
@Test
void with_space_soft() {
assertEquals(
"on\ne\ntw\no",
new Formatter(2, true, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(3, true, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(4, true, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(5, true, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(6, true, "\n").format("one two")
);
assertEquals(
"one two",
new Formatter(7, true, "\n").format("one two")
);
}
@Test
void with_underscore_hard() {
assertEquals(
"on\ne_\ntw\no",
new Formatter(2, false, "\n").format("one_two")
);
assertEquals(
"one\n_tw\no",
new Formatter(3, false, "\n").format("one_two")
);
assertEquals(
"one_\ntwo",
new Formatter(4, false, "\n").format("one_two")
);
assertEquals(
"one_t\nwo",
new Formatter(5, false, "\n").format("one_two")
);
assertEquals(
"one_tw\no",
new Formatter(6, false, "\n").format("one_two")
);
assertEquals(
"one_two",
new Formatter(7, false, "\n").format("one_two")
);
}
@Test
void with_underscore_soft() {
assertEquals(
"on\ne_\ntw\no",
new Formatter(2, true, "\n").format("one_two")
);
assertEquals(
"one\n_tw\no",
new Formatter(3, true, "\n").format("one_two")
);
assertEquals(
"one_\ntwo",
new Formatter(4, true, "\n").format("one_two")
);
assertEquals(
"one_t\nwo",
new Formatter(5, true, "\n").format("one_two")
);
assertEquals(
"one_tw\no",
new Formatter(6, true, "\n").format("one_two")
);
assertEquals(
"one_two",
new Formatter(7, true, "\n").format("one_two")
);
}
@Test
void spaceFold() {
assertEquals(
"one\ntwo",
new Formatter(5, true, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(5, true, "\n").format("one two")
);
assertEquals(
"one\ntwo",
new Formatter(5, true, "\n").format("one \n\t two")
);
}
}
-
\$\begingroup\$ The feature "to use Scanner and process the input line by line (in parallel), and not to use StringBuilder" is none of my requirements. I have to reformat small pieces of text, perhaps not more then 300 characters at all. But it's an interesting suggestion for an additional feature. And in my special use case I only must use
\n
as line sep. \$\endgroup\$Tobias Grothe– Tobias Grothe2024年04月03日 09:19:20 +00:00Commented Apr 3, 2024 at 9:19 -
1\$\begingroup\$ @TobiasGrothe This does not run in parallel. It's possible in theory to parallelise it, but this would likely be more complication than it's worth. Notice the
false
argument toStreamSupport.stream()
. \$\endgroup\$Reinderien– Reinderien2024年04月03日 12:20:12 +00:00Commented Apr 3, 2024 at 12:20