4
\$\begingroup\$

I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.

import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle {
 public static String CapFirstTitle() {
 String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
 Set<String> whiteList = new HashSet<String>();
 whiteList.add("a");
 whiteList.add("above");
 whiteList.add("after");
 whiteList.add("ain't");
 whiteList.add("among");
 whiteList.add("an");
 whiteList.add("and");
 whiteList.add("as");
 whiteList.add("at");
 whiteList.add("below");
 whiteList.add("but");
 whiteList.add("by");
 whiteList.add("can't");
 whiteList.add("don't");
 whiteList.add("for");
 whiteList.add("from");
 whiteList.add("if");
 whiteList.add("in");
 whiteList.add("into");
 whiteList.add("it's");
 whiteList.add("nor");
 whiteList.add("of");
 whiteList.add("off");
 whiteList.add("on");
 whiteList.add("onto");
 whiteList.add("or");
 whiteList.add("over");
 whiteList.add("since");
 whiteList.add("the");
 whiteList.add("to");
 whiteList.add("under");
 whiteList.add("until");
 whiteList.add("up");
 whiteList.add("with");
 whiteList.add("won't");
 Set<String> alwaysCapitalize = new HashSet<String>();
 alwaysCapitalize.add("II");
 alwaysCapitalize.add("III");
 alwaysCapitalize.add("IV");
 alwaysCapitalize.add("V");
 alwaysCapitalize.add("VI");
 alwaysCapitalize.add("VII");
 alwaysCapitalize.add("VIII");
 alwaysCapitalize.add("IX");
 alwaysCapitalize.add("X");
 alwaysCapitalize.add("XI");
 alwaysCapitalize.add("XII");
 alwaysCapitalize.add("XIII");
 alwaysCapitalize.add("XIV");
 alwaysCapitalize.add("XV");
 alwaysCapitalize.add("XVI");
 alwaysCapitalize.add("XVII");
 alwaysCapitalize.add("XVIII");
 alwaysCapitalize.add("XIX");
 alwaysCapitalize.add("XX");
 alwaysCapitalize.add("XXI");
 alwaysCapitalize.add("OD");
 alwaysCapitalize.add("ID");
 alwaysCapitalize.add("PH");
 alwaysCapitalize.add("XH");
 alwaysCapitalize.add("UV");
 alwaysCapitalize.add("DOM");
 alwaysCapitalize.add("GS");
 alwaysCapitalize.add("ii");
 alwaysCapitalize.add("iii");
 alwaysCapitalize.add("iv");
 alwaysCapitalize.add("v");
 alwaysCapitalize.add("vi");
 alwaysCapitalize.add("vii");
 alwaysCapitalize.add("viii");
 alwaysCapitalize.add("ix");
 alwaysCapitalize.add("x");
 alwaysCapitalize.add("xi");
 alwaysCapitalize.add("xii");
 alwaysCapitalize.add("xiii");
 alwaysCapitalize.add("xiv");
 alwaysCapitalize.add("xv");
 alwaysCapitalize.add("xvi");
 alwaysCapitalize.add("xvii");
 alwaysCapitalize.add("xviii");
 alwaysCapitalize.add("xix");
 alwaysCapitalize.add("xx");
 alwaysCapitalize.add("xxi");
 alwaysCapitalize.add("od");
 alwaysCapitalize.add("id");
 alwaysCapitalize.add("ph");
 alwaysCapitalize.add("xh");
 alwaysCapitalize.add("uv");
 alwaysCapitalize.add("dom");
 alwaysCapitalize.add("gs");
 StringBuilder capitalizedString = new StringBuilder();
 if (inputText.contains(" ")) {
 inputText = inputText.toLowerCase().trim();
 String[] parts = inputText.split(" ");
 for (int i = 0; i < parts.length; i++) {
 if (i == 0 | i == parts.length - 1) {
 capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
 } else {
 if (!whiteList.contains(parts[i])) {
 if (!alwaysCapitalize.contains(parts[i])) {
 capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
 } else {
 capitalizedString.append(parts[i].toUpperCase()).append(" ");
 }
 } else {
 capitalizedString.append(parts[i]).append(" ");
 }
 }
 }
 }
 return capitalizedString.toString();
 }
}
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked May 1, 2018 at 22:42
\$\endgroup\$
3
  • 1
    \$\begingroup\$ What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step. \$\endgroup\$ Commented May 1, 2018 at 22:52
  • \$\begingroup\$ It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully. \$\endgroup\$ Commented May 1, 2018 at 23:04
  • \$\begingroup\$ Looks like it doesn't \$\endgroup\$ Commented May 1, 2018 at 23:09

1 Answer 1

4
\$\begingroup\$
  • CapFirstTitle() is a method name that disregards java coding conventions for method names. Use standard lowerCamelCase.

  • whitelist and alwaysCapitalize should be private static final members. You can initialize these in a static initializer block like this:

    private static final Set<String> whitelist = new HashSet<>();
    static {
     whitelist.addAll(Arrays.asList("a", "above", "after", ...);
    }
    
  • The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use || over | here.

  • There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself. YMMV. Since you're extremely consistent about it, I don't see any need to change it.

  • You're calling toLowerCase without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)

  • This code always appends a " " at the end of the String. You could avoid this by first overwriting the parts array with the capitalized parts and subsequently using Arrays.stream(parts).collect(Collectors.joining(" "));

    Since Java 8, String exposes the method join (thanks to Koekje for pointing that out in the coments), you can use as String.join(" ", parts);

The last point also contains an option for a deeper rewrite. Consider the following:

String[] capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
 .map(CapFirstTitle::capSingleWord)
 .toArray(String[]::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);

This is a solution that hides the details of whitelist and alwaysCapitalize inside a method and clears the enforced capitalization of the first and last word.

It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.

answered May 1, 2018 at 23:28
\$\endgroup\$
4
  • \$\begingroup\$ This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you! \$\endgroup\$ Commented May 2, 2018 at 0:19
  • \$\begingroup\$ The binary or | was a typo \$\endgroup\$ Commented May 2, 2018 at 0:28
  • \$\begingroup\$ Very good answer! You can also use the join method of String, see docs.oracle.com/javase/8/docs/api/java/lang/… \$\endgroup\$ Commented May 2, 2018 at 11:40
  • \$\begingroup\$ @Koekje I forgot about that method, it seems ... Thanks for reminding me :) \$\endgroup\$ Commented May 2, 2018 at 11:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.