I'm trying to create a robust function that receives a string -- typically a paragraph or two -- and the max number of characters in the output. One condition is to make sure that we return full words and never cut one in half.
Here's the code I have and I feel it needs to be improved. This is normal ES2015/ES6 with no additional libraries.
export const shortenTextToSpecifiedNumberOfCharacters = (input, numberOfCharacters) => {
if (!input) return "";
else if (input.length <= numberOfCharacters) return input;
for (var i = numberOfCharacters; i > 0; i--) {
if (input[i] == ' ') {
return input.slice(0, i) + ' ...';
}
}
}
1 Answer 1
You code can end up falling out of the loop and implicitly returning undefined
if there are no spaces within the first numberOfCharacters
characters in the input string. You might want to add an explicit return
statement at the end of the function to handle that case.
Also, in your loop you only test for spaces, not for newlines, tabs or other whitespace characters. You could simply add tests for those into your if
statement, like this:
if (input[i] == ' ' || input[i] == '\t' || input[i] == '\r' || input[i] == '\n') {
but it would be shorter and cleaner to use either indexOf()
:
if (" \t\r\n".indexOf(input[i]) >= 0) {
or a regexp test:
if (/\s/.test(input[i])) {
You may also wish to ensure that input[i-1]
is not a whitespace character. You could do it like this:
const spaces = " \t\r\n";
for (var i = numberOfCharacters; i > 1; i--) {
if (spaces.indexOf(input[i]) >= 0 && spaces.indexOf(input[i-1]) < 0) {
return input.slice(0, i) + ' ...';
}
}
or, using a regexp, like this:
for (var i = numberOfCharacters; i > 1; i--) {
if (/\S\s/.test(input.slice(i-1, i+1))) {
return input.slice(0, i) + ' ...';
}
}
Also note that the output of your function may be up to 4 characters (the length of the " ..."
suffix) longer than the specified length limit. If that's not what you want, you may wish to change your for
loop to:
for (var i = numberOfCharacters - 4; i > 1; i--) {
-
\$\begingroup\$ While this code fixes the bug(s) in the original, it's really hard to maintain it in a long run. \$\endgroup\$Igor Soloydenko– Igor Soloydenko2017年10月09日 20:16:06 +00:00Commented Oct 9, 2017 at 20:16
-
\$\begingroup\$ @IgorSoloydenko: You're probably right, and I'm not convinced that the single-regexp solution would necessarily have better performance either, since it needs to dynamically reconstruct a new RegExp object on every call. I've removed that part from my answer. \$\endgroup\$Ilmari Karonen– Ilmari Karonen2017年10月09日 20:43:41 +00:00Commented Oct 9, 2017 at 20:43
-
\$\begingroup\$ I really like the updated version's readability improvement. Up voted \$\endgroup\$Igor Soloydenko– Igor Soloydenko2017年10月09日 20:46:15 +00:00Commented Oct 9, 2017 at 20:46
-
\$\begingroup\$ Thank you for your detailed response. If I'm following the answer and the comments correctly, the final version of the function would look like the one I posted in the UPDATE section of original post. I'm not using
RegEx
for better performance -- as per comments. \$\endgroup\$Sam– Sam2017年10月10日 20:14:38 +00:00Commented Oct 10, 2017 at 20:14 -
\$\begingroup\$ I'm seeing two interesting behavior here. This code -- without Regex -- seems to be counting the line breaks as well. Secondly, it's not adding the ellipsis at the end. I'm thinking it may be a better idea to use RegEx to take the text without any line breaks, then count the number of characters. \$\endgroup\$Sam– Sam2017年10月13日 04:44:46 +00:00Commented Oct 13, 2017 at 4:44
return
after afor
loop, andinput[i] == ' '
may not be achieved. that will result in returning anundefined
. b) reachinginput[i] == ' '
may still be to early. here's a case:shorten("lorem ipsum lalala blah a", 10)
will result in `"lorem ipsum lalala blah" instead of "lorem ipsum lalala"... \$\endgroup\$i
starts counting down from 10, and so the output you'll get will be"lorem ..."
. \$\endgroup\$