Title capitalization of strings in Java

Question 1

I needed to port some Coldfusion code to Java that does Title Capitalization on Strings and came up with this solution. I would love it if anyone could give me some pointers as to how to make this more efficient.

import org.apache.commons.text.WordUtils;
import java.util.HashSet;
import java.util.Set;
public class CapFirstTitle {
 public static String CapFirstTitle() {
 String inputText = " from a String to test Title Capitalization CASE hyphenated-word word from IBM od id XX if ";
 Set<String> whiteList = new HashSet<String>();
 whiteList.add("a");
 whiteList.add("above");
 whiteList.add("after");
 whiteList.add("ain't");
 whiteList.add("among");
 whiteList.add("an");
 whiteList.add("and");
 whiteList.add("as");
 whiteList.add("at");
 whiteList.add("below");
 whiteList.add("but");
 whiteList.add("by");
 whiteList.add("can't");
 whiteList.add("don't");
 whiteList.add("for");
 whiteList.add("from");
 whiteList.add("if");
 whiteList.add("in");
 whiteList.add("into");
 whiteList.add("it's");
 whiteList.add("nor");
 whiteList.add("of");
 whiteList.add("off");
 whiteList.add("on");
 whiteList.add("onto");
 whiteList.add("or");
 whiteList.add("over");
 whiteList.add("since");
 whiteList.add("the");
 whiteList.add("to");
 whiteList.add("under");
 whiteList.add("until");
 whiteList.add("up");
 whiteList.add("with");
 whiteList.add("won't");
 Set<String> alwaysCapitalize = new HashSet<String>();
 alwaysCapitalize.add("II");
 alwaysCapitalize.add("III");
 alwaysCapitalize.add("IV");
 alwaysCapitalize.add("V");
 alwaysCapitalize.add("VI");
 alwaysCapitalize.add("VII");
 alwaysCapitalize.add("VIII");
 alwaysCapitalize.add("IX");
 alwaysCapitalize.add("X");
 alwaysCapitalize.add("XI");
 alwaysCapitalize.add("XII");
 alwaysCapitalize.add("XIII");
 alwaysCapitalize.add("XIV");
 alwaysCapitalize.add("XV");
 alwaysCapitalize.add("XVI");
 alwaysCapitalize.add("XVII");
 alwaysCapitalize.add("XVIII");
 alwaysCapitalize.add("XIX");
 alwaysCapitalize.add("XX");
 alwaysCapitalize.add("XXI");
 alwaysCapitalize.add("OD");
 alwaysCapitalize.add("ID");
 alwaysCapitalize.add("PH");
 alwaysCapitalize.add("XH");
 alwaysCapitalize.add("UV");
 alwaysCapitalize.add("DOM");
 alwaysCapitalize.add("GS");
 alwaysCapitalize.add("ii");
 alwaysCapitalize.add("iii");
 alwaysCapitalize.add("iv");
 alwaysCapitalize.add("v");
 alwaysCapitalize.add("vi");
 alwaysCapitalize.add("vii");
 alwaysCapitalize.add("viii");
 alwaysCapitalize.add("ix");
 alwaysCapitalize.add("x");
 alwaysCapitalize.add("xi");
 alwaysCapitalize.add("xii");
 alwaysCapitalize.add("xiii");
 alwaysCapitalize.add("xiv");
 alwaysCapitalize.add("xv");
 alwaysCapitalize.add("xvi");
 alwaysCapitalize.add("xvii");
 alwaysCapitalize.add("xviii");
 alwaysCapitalize.add("xix");
 alwaysCapitalize.add("xx");
 alwaysCapitalize.add("xxi");
 alwaysCapitalize.add("od");
 alwaysCapitalize.add("id");
 alwaysCapitalize.add("ph");
 alwaysCapitalize.add("xh");
 alwaysCapitalize.add("uv");
 alwaysCapitalize.add("dom");
 alwaysCapitalize.add("gs");
 StringBuilder capitalizedString = new StringBuilder();
 if (inputText.contains(" ")) {
 inputText = inputText.toLowerCase().trim();
 String[] parts = inputText.split(" ");
 for (int i = 0; i < parts.length; i++) {
 if (i == 0 | i == parts.length - 1) {
 capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
 } else {
 if (!whiteList.contains(parts[i])) {
 if (!alwaysCapitalize.contains(parts[i])) {
 capitalizedString.append(WordUtils.capitalize(parts[i], ' ', '-')).append(" ");
 } else {
 capitalizedString.append(parts[i].toUpperCase()).append(" ");
 }
 } else {
 capitalizedString.append(parts[i]).append(" ");
 }
 }
 }
 }
 return capitalizedString.toString();
 }
}

Question 2

What precisely should the code do? Because I wonder for example why you convert the input text completely to lower case as a first step.

Question 3

It is supposed to clean up strings that may be in the form of "This IS a SAmPle STRING" first. I didn't check if Apache Commons WordUtils already handles that particular case. Then it checks if we are on the first word or the last word of the string and if so don't check against the whitelist of words that should stay lowercase. Otherwise check against the whitelist and capitalize all words not on the whitelist, then go on to check against words that need to stay capitalized fully.

Question 4

Looks like it doesn't

Question 5

CapFirstTitle() is a method name that disregards java coding conventions for method names. Use standard lowerCamelCase.

whitelist and alwaysCapitalize should be private static final members. You can initialize these in a static initializer block like this:

private static final Set<String> whitelist = new HashSet<>();
static {
 whitelist.addAll(Arrays.asList("a", "above", "after", ...);
}

The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use || over | here.
There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself. YMMV. Since you're extremely consistent about it, I don't see any need to change it.
You're calling toLowerCase without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)
This code always appends a " " at the end of the String. You could avoid this by first overwriting the parts array with the capitalized parts and subsequently using Arrays.stream(parts).collect(Collectors.joining(" "));

Since Java 8, String exposes the method join (thanks to Koekje for pointing that out in the coments), you can use as String.join(" ", parts);

The last point also contains an option for a deeper rewrite. Consider the following:

String[] capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
 .map(CapFirstTitle::capSingleWord)
 .toArray(String[]::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);

This is a solution that hides the details of whitelist and alwaysCapitalize inside a method and clears the enforced capitalization of the first and last word.

It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.

Question 6

This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!

Question 7

The binary or | was a typo

Question 8

Very good answer! You can also use the join method of String, see docs.oracle.com/javase/8/docs/api/java/lang/…

Question 9

@Koekje I forgot about that method, it seems ... Thanks for reminding me :)

Vogel612 Vogel612 25.5k7 gold badges59 silver badges141 bronze badges · Accepted Answer · 2018-05-01 23:28:52Z

CapFirstTitle() is a method name that disregards java coding conventions for method names. Use standard lowerCamelCase.

whitelist and alwaysCapitalize should be private static final members. You can initialize these in a static initializer block like this:

private static final Set<String> whitelist = new HashSet<>();
static {
 whitelist.addAll(Arrays.asList("a", "above", "after", ...);
}

The binary or is a little uncommon in conditions. You won't really get much benefit out of it, especially since the boolean or is short-circuiting. Use || over | here.
There is a lot of vertical whitespace here. I personally find that hard to follow and I'd delete most of that space if I wrote this code myself. YMMV. Since you're extremely consistent about it, I don't see any need to change it.
You're calling toLowerCase without specifying the Locale. While this may work in most cases, you should be aware of edge-cases involving languages like Turkish. If you have any way to be locale-aware here, you should strive to be. That can save you a lot of headaches down the line :)
This code always appends a " " at the end of the String. You could avoid this by first overwriting the parts array with the capitalized parts and subsequently using Arrays.stream(parts).collect(Collectors.joining(" "));

Since Java 8, String exposes the method join (thanks to Koekje for pointing that out in the coments), you can use as String.join(" ", parts);

The last point also contains an option for a deeper rewrite. Consider the following:

String[] capitalizedParts = Arrays.stream(input.toLowerCase().trim().split(" "))
 .map(CapFirstTitle::capSingleWord)
 .toArray(String[]::new);
capitalizedParts[0] = WordUtils.capitalize(capitalizedParts[0], ' ', '-');
int lastIdx = capitalizedParts.length - 1;
capitalizedParts[lastIdx] = WordUtils.capitalize(capitalizedParts[lastIdx], ' ', '-');
return String.join(" ", capitalizedParts);

This is a solution that hides the details of whitelist and alwaysCapitalize inside a method and clears the enforced capitalization of the first and last word.

It avoids the (correct, but mental overhead inducing) use of StringBuilder and separates the capitalization logic from the reassembly of the String. You can even go so far and reduce multiple spaces in the same process by filtering out empty array elements in the first Stream.

This is incredible. I usually have to work on an island and this is going to help me become a better programmer! I really appreciate all of the points made. I will take them to heart. Thank you!
Very good answer! You can also use the join method of String, see docs.oracle.com/javase/8/docs/api/java/lang/…
@Koekje I forgot about that method, it seems ... Thanks for reminding me :)

Stack Exchange Network

Title capitalization of strings in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Title capitalization of strings in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions