Function that returns words in string not exceeding max number of characters

Question 1

I'm trying to create a robust function that receives a string -- typically a paragraph or two -- and the max number of characters in the output. One condition is to make sure that we return full words and never cut one in half.

Here's the code I have and I feel it needs to be improved. This is normal ES2015/ES6 with no additional libraries.

export const shortenTextToSpecifiedNumberOfCharacters = (input, numberOfCharacters) => {
 if (!input) return "";
 else if (input.length <= numberOfCharacters) return input;
 for (var i = numberOfCharacters; i > 0; i--) {
 if (input[i] == ' ') {
 return input.slice(0, i) + ' ...';
 }
 }
}

Question 2

Is this vanilla JavaScript/ES6 or are you using Node.js or other libraries?

Question 3

Just vanilla ES2015

Question 4

Great! Hope you get some good reviews.

Question 5

This code is probably not working properly. a) there's no return after a for loop, and input[i] == ' ' may not be achieved. that will result in returning an undefined. b) reaching input[i] == ' ' may still be to early. here's a case: shorten("lorem ipsum lalala blah a", 10) will result in `"lorem ipsum lalala blah" instead of "lorem ipsum lalala"...

Question 6

@IgorSoloydenko: Your case (b) cannot happen, since with those arguments i starts counting down from 10, and so the output you'll get will be "lorem ...".

Question 7

You code can end up falling out of the loop and implicitly returning undefined if there are no spaces within the first numberOfCharacters characters in the input string. You might want to add an explicit return statement at the end of the function to handle that case.

Also, in your loop you only test for spaces, not for newlines, tabs or other whitespace characters. You could simply add tests for those into your if statement, like this:

if (input[i] == ' ' || input[i] == '\t' || input[i] == '\r' || input[i] == '\n') {

but it would be shorter and cleaner to use either indexOf():

if (" \t\r\n".indexOf(input[i]) >= 0) {

or a regexp test:

if (/\s/.test(input[i])) {

You may also wish to ensure that input[i-1] is not a whitespace character. You could do it like this:

const spaces = " \t\r\n";
for (var i = numberOfCharacters; i > 1; i--) {
 if (spaces.indexOf(input[i]) >= 0 && spaces.indexOf(input[i-1]) < 0) {
 return input.slice(0, i) + ' ...';
 }
}

or, using a regexp, like this:

for (var i = numberOfCharacters; i > 1; i--) {
 if (/\S\s/.test(input.slice(i-1, i+1))) {
 return input.slice(0, i) + ' ...';
 }
}

Also note that the output of your function may be up to 4 characters (the length of the " ..." suffix) longer than the specified length limit. If that's not what you want, you may wish to change your for loop to:

for (var i = numberOfCharacters - 4; i > 1; i--) {

Question 8

While this code fixes the bug(s) in the original, it's really hard to maintain it in a long run.

Question 9

@IgorSoloydenko: You're probably right, and I'm not convinced that the single-regexp solution would necessarily have better performance either, since it needs to dynamically reconstruct a new RegExp object on every call. I've removed that part from my answer.

Question 10

I really like the updated version's readability improvement. Up voted

Question 11

Thank you for your detailed response. If I'm following the answer and the comments correctly, the final version of the function would look like the one I posted in the UPDATE section of original post. I'm not using RegEx for better performance -- as per comments.

Question 12

I'm seeing two interesting behavior here. This code -- without Regex -- seems to be counting the line breaks as well. Secondly, it's not adding the ellipsis at the end. I'm thinking it may be a better idea to use RegEx to take the text without any line breaks, then count the number of characters.

Ilmari Karonen Ilmari Karonen 3,20917 silver badges19 bronze badges · Accepted Answer · 2017-10-09 20:12:57Z

You code can end up falling out of the loop and implicitly returning undefined if there are no spaces within the first numberOfCharacters characters in the input string. You might want to add an explicit return statement at the end of the function to handle that case.

Also, in your loop you only test for spaces, not for newlines, tabs or other whitespace characters. You could simply add tests for those into your if statement, like this:

if (input[i] == ' ' || input[i] == '\t' || input[i] == '\r' || input[i] == '\n') {

but it would be shorter and cleaner to use either indexOf():

if (" \t\r\n".indexOf(input[i]) >= 0) {

or a regexp test:

if (/\s/.test(input[i])) {

You may also wish to ensure that input[i-1] is not a whitespace character. You could do it like this:

const spaces = " \t\r\n";
for (var i = numberOfCharacters; i > 1; i--) {
 if (spaces.indexOf(input[i]) >= 0 && spaces.indexOf(input[i-1]) < 0) {
 return input.slice(0, i) + ' ...';
 }
}

or, using a regexp, like this:

for (var i = numberOfCharacters; i > 1; i--) {
 if (/\S\s/.test(input.slice(i-1, i+1))) {
 return input.slice(0, i) + ' ...';
 }
}

Also note that the output of your function may be up to 4 characters (the length of the " ..." suffix) longer than the specified length limit. If that's not what you want, you may wish to change your for loop to:

for (var i = numberOfCharacters - 4; i > 1; i--) {

While this code fixes the bug(s) in the original, it's really hard to maintain it in a long run.
@IgorSoloydenko: You're probably right, and I'm not convinced that the single-regexp solution would necessarily have better performance either, since it needs to dynamically reconstruct a new RegExp object on every call. I've removed that part from my answer.
I really like the updated version's readability improvement. Up voted
Thank you for your detailed response. If I'm following the answer and the comments correctly, the final version of the function would look like the one I posted in the UPDATE section of original post. I'm not using RegEx for better performance -- as per comments.
I'm seeing two interesting behavior here. This code -- without Regex -- seems to be counting the line breaks as well. Secondly, it's not adding the ellipsis at the end. I'm thinking it may be a better idea to use RegEx to take the text without any line breaks, then count the number of characters.

Stack Exchange Network

Function that returns words in string not exceeding max number of characters

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function that returns words in string not exceeding max number of characters

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions