0
\$\begingroup\$

I have a web application, as part of it I regularly have to send out "overviews" of records in a database, which contain shortened excerpts of a description-text of that record.

What the code is supposed to do:

  1. Take an HTML string
  2. Remove the HTML brackets so that only text remains
  3. cut it down to SHORT_DESCRIPTION_WORD_COUNT words
  4. if there are more than SHORT_DESCRIPTION_WORD_COUNT words, add "..." at the end

My current implementation looks like this:

import std/[strutils, strformat, htmlparser, xmltree, enumerate]
const SHORT_DESCRIPTION_WORD_COUNT: int = 40
proc truncate*(text: string): string =
 let cleanedString = text.parseHtml().innerText
 let splitString = cleanedString.split(" ")
 
 if splitString.len() <= SHORT_DESCRIPTION_WORD_COUNT:
 result = cleanedString
 else:
 result.add(splitString[0..SHORT_DESCRIPTION_WORD_COUNT-1].join(" "))
 result.add("...")

Looking at some valgrind data, I determined that this proc takes up a sizeable chunk of time and would benefit from being speed-performance optimized a bit.

I'm mostly wondering how to do that though. I believe the intensive steps here are the cleaning up of the HTML and then splitting the string, I'm not sure how to avoid those.

asked Jul 2, 2022 at 15:23
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

After consulting with Yardanico in nim's discord for a bit, there is 1 significant improvement that can be made here and a couple minor ones:

Main Improvement: Use the split iterator

I use the split proc and assign the seq of strings into a variable to iterate over. That is wasteful, when you can use the split iterator instead.

Since you still need to track which word you're at to know when you hit SHORT_DESCRIPTION_WORD_COUNT, you can use std/enumerate, which is sugar over writing var i=0 and manually incrementing that within the loop.

proc truncate_breakfor_enumerate*(text: string): string =
 let cleanedString = text.parseHtml().innerText
 for i, str in enumerate(cleanedString.split(" ")):
 result.add(" " & str)
 if i >= SHORT_DESCRIPTION_WORD_COUNT:
 result.add("...")
 break 

Minor Improvement: Pre-allocate the string

This matters mostly for scenarios in which you add longer strings. Having the string pre-allocated to roughly the correct size can reduce the number of memory allocations needed overall. That is, because if the string outgrows its size as you add substrings to it, it will need to be copied elsewhere. You can do this with newStringOfCap.

This will only even start to matter if you start having a fairly sizeable amount of text (>5000 chars) and even then the improvements are somewhat marginal.

proc truncate_breakformemoryalloc*(text: string): string =
 let cleanedString = text.parseHtml().innerText
 result = newStringOfCap(400) # <-- The important bit
 for i, str in enumerate(cleanedString.split(" ")):
 result.add(" " & str)
 if i >= SHORT_DESCRIPTION_WORD_COUNT:
 result.add("...")
 break 

Running these over a benchmark with one of normal text and another more artificial one with regularly words above 50 characters, these are my results:

name ......................................... min time avg time std dv runs
truncate classic ............................. 0.046 ms 0.053 ms ±0.004 x1000
truncate_breakfor_enumerate................... 0.021 ms 0.023 ms ±0.001 x1000
truncate_breakfor_enumerate_memoryalloc ...... 0.021 ms 0.025 ms ±0.004 x1000
truncate classic long ........................ 0.050 ms 0.082 ms ±0.019 x1000
truncate_breakfor_enumerate long ............. 0.036 ms 0.040 ms ±0.007 x1000
truncate_breakfor_enumerate_memoryalloc long ..0.035 ms 0.038 ms ±0.004 x1000

Both of these versions should do perfectly fine for any purposes.

answered Jul 2, 2022 at 15:23
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.