I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.
import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle {
public static String CapFirstTitle() {
String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
Set<String> whiteList = new HashSet<String>();
whiteList.add("a");
whiteList.add("above");
whiteList.add("after");
whiteList.add("ain't");
whiteList.add("among");
whiteList.add("an");
whiteList.add("and");
whiteList.add("as");
whiteList.add("at");
whiteList.add("below");
whiteList.add("but");
whiteList.add("by");
whiteList.add("can't");
whiteList.add("don't");
whiteList.add("for");
whiteList.add("from");
whiteList.add("if");
whiteList.add("in");
whiteList.add("into");
whiteList.add("it's");
whiteList.add("nor");
whiteList.add("of");
whiteList.add("off");
whiteList.add("on");
whiteList.add("onto");
whiteList.add("or");
whiteList.add("over");
whiteList.add("since");
whiteList.add("the");
whiteList.add("to");
whiteList.add("under");
whiteList.add("until");
whiteList.add("up");
whiteList.add("with");
whiteList.add("won't");
Set<String> alwaysCapitalize = new HashSet<String>();
alwaysCapitalize.add("II");
alwaysCapitalize.add("III");
alwaysCapitalize.add("IV");
alwaysCapitalize.add("V");
alwaysCapitalize.add("VI");
alwaysCapitalize.add("VII");
alwaysCapitalize.add("VIII");
alwaysCapitalize.add("IX");
alwaysCapitalize.add("X");
alwaysCapitalize.add("XI");
alwaysCapitalize.add("XII");
alwaysCapitalize.add("XIII");
alwaysCapitalize.add("XIV");
alwaysCapitalize.add("XV");
alwaysCapitalize.add("XVI");
alwaysCapitalize.add("XVII");
alwaysCapitalize.add("XVIII");
alwaysCapitalize.add("XIX");
alwaysCapitalize.add("XX");
alwaysCapitalize.add("XXI");
alwaysCapitalize.add("OD");
alwaysCapitalize.add("ID");
alwaysCapitalize.add("PH");
alwaysCapitalize.add("XH");
alwaysCapitalize.add("UV");
alwaysCapitalize.add("DOM");
alwaysCapitalize.add("GS");
alwaysCapitalize.add("ii");
alwaysCapitalize.add("iii");
alwaysCapitalize.add("iv");
alwaysCapitalize.add("v");
alwaysCapitalize.add("vi");
alwaysCapitalize.add("vii");
alwaysCapitalize.add("viii");
alwaysCapitalize.add("ix");
alwaysCapitalize.add("x");
alwaysCapitalize.add("xi");
alwaysCapitalize.add("xii");
alwaysCapitalize.add("xiii");
alwaysCapitalize.add("xiv");
alwaysCapitalize.add("xv");
alwaysCapitalize.add("xvi");
alwaysCapitalize.add("xvii");
alwaysCapitalize.add("xviii");
alwaysCapitalize.add("xix");
alwaysCapitalize.add("xx");
alwaysCapitalize.add("xxi");
alwaysCapitalize.add("od");
alwaysCapitalize.add("id");
alwaysCapitalize.add("ph");
alwaysCapitalize.add("xh");
alwaysCapitalize.add("uv");
alwaysCapitalize.add("dom");
alwaysCapitalize.add("gs");
StringBuilder capitalizedString = new StringBuilder();
if (inputText.contains(" ")) {
inputText = inputText.toLowerCase().trim();
String[] parts = inputText.split(" ");
for (int i = 0; i < parts.length; i++) {
if (i == 0 | i == parts.length - 1) {
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
} else {
if (!whiteList.contains(parts[i])) {
if (!alwaysCapitalize.contains(parts[i])) {
capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
} else {
capitalizedString.append(parts[i].toUpperCase()).append(" ");
}
} else {
capitalizedString.append(parts[i]).append(" ");
}
}
}
}
return capitalizedString.toString();
}
}
-
1\$\begingroup\$ What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step. \$\endgroup\$Koekje– Koekje2018年05月01日 22:52:37 +00:00Commented May 1, 2018 at 22:52
-
\$\begingroup\$ It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully. \$\endgroup\$ozfive– ozfive2018年05月01日 23:04:50 +00:00Commented May 1, 2018 at 23:04
-
\$\begingroup\$ Looks like it doesn't \$\endgroup\$ozfive– ozfive2018年05月01日 23:09:38 +00:00Commented May 1, 2018 at 23:09
1 Answer 1
CapFirstTitle()
is a method name that disregards java coding conventions for method names. Use standardlowerCamelCase
.whitelist
andalwaysCapitalize
should beprivate static final
members. You can initialize these in a static initializer block like this:private static final Set<String> whitelist = new HashSet<>(); static { whitelist.addAll(Arrays.asList("a", "above", "after", ...); }
The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use
||
over|
here.There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself. YMMV. Since you're extremely consistent about it, I don't see any need to change it.
You're calling
toLowerCase
without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)This code always appends a
" "
at the end of the String. You could avoid this by first overwriting theparts
array with the capitalized parts and subsequently usingArrays.stream(parts).collect(Collectors.joining(" "));
Since Java 8, String exposes the method
join
(thanks to Koekje for pointing that out in the coments), you can use asString.join(" ", parts);
The last point also contains an option for a deeper rewrite. Consider the following:
String[] capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
.map(CapFirstTitle::capSingleWord)
.toArray(String[]::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);
This is a solution that hides the details of whitelist
and alwaysCapitalize
inside a method and clears the enforced capitalization of the first and last word.
It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.
-
\$\begingroup\$ This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you! \$\endgroup\$ozfive– ozfive2018年05月02日 00:19:19 +00:00Commented May 2, 2018 at 0:19
-
\$\begingroup\$ The binary or | was a typo \$\endgroup\$ozfive– ozfive2018年05月02日 00:28:05 +00:00Commented May 2, 2018 at 0:28
-
\$\begingroup\$ Very good answer! You can also use the
join
method ofString
, see docs.oracle.com/javase/8/docs/api/java/lang/… \$\endgroup\$Koekje– Koekje2018年05月02日 11:40:18 +00:00Commented May 2, 2018 at 11:40 -
\$\begingroup\$ @Koekje I forgot about that method, it seems ... Thanks for reminding me :) \$\endgroup\$Vogel612– Vogel6122018年05月02日 11:45:39 +00:00Commented May 2, 2018 at 11:45