5
\$\begingroup\$

I recently ported a blog of mine from Python to Go (to improve speed and performance) and while all is great so far, I'd like some help optimising the Markdown function to improve the general performance, maintenance and readability of the function.

I have this function because I write my blog articles in Markdown (.md) and then use (削除) Python (削除ここまで) Go to convert the raw Markdown to HTML for output as this saves me from having to write ridiculous amounts of HTML. (which can be tedious to say the least)

The Markdown function takes one argument (raw) which is a string that contains the raw Markdown (obtained using ioutil.ReadFile).

It then splits the Markdown by \n (removing the empty lines) and converts:

  • Bold and italic text (***,**,*)
  • Strikethrough text (~~blah blah blah~~)
  • Underscored text (__blah blah blah__)
  • Links ([https://example.com](Example Link))
  • Blockquotes (> sample quote by an important person)
  • Inline code (`abcccc`)
  • Headings (h1-h6)

While some of the supported features aren't exactly standard, this function works and outputs the expected result without any errors but being a new Go programmer and this being my first "real" Go project I'd like to know whether or not my code could be optimised for better performance, maintainability and readability.

Here a few questions I have regarding optimisation:

  • Would it make a difference to performance if I reduced the amount of imports?
  • Would it improve readability if I put the regexp.MustCompile functions into variables above the Markdown function?
  • Would it improve performance if I used individual regexes to convert Markdown headings instead of using for i := 6; i >= 1; i-- {...}?
  • If not, is there a way to convert i (an integer) to a string without using strconv.Itoa(i) (to help reduce the amount of imports)?

Here is my code:

package parse
import (
 "regexp"
 "strings"
 "strconv"
)
func Markdown(raw string) string {
 // ignore empty lines with "string.Split(...)"
 lines := strings.FieldsFunc(raw, func(c rune) bool {
 return c == '\n'
 })
 for i, line := range lines {
 // wrap bold and italic text in "<b>" and "<i>" elements
 line = regexp.MustCompile(`\*\*\*(.*?)\*\*\*`).ReplaceAllString(line, `<b><i>1ドル</i></b>`)
 line = regexp.MustCompile(`\*\*(.*?)\*\*`).ReplaceAllString(line, `<b>1ドル</b>`)
 line = regexp.MustCompile(`\*(.*?)\*`).ReplaceAllString(line, `<i>1ドル</i>`)
 // wrap strikethrough text in "<s>" tags
 line = regexp.MustCompile(`\~\~(.*?)\~\~`).ReplaceAllString(line, `<s>1ドル</s>`)
 // wrap underscored text in "<u>" tags
 line = regexp.MustCompile(`__(.*?)__`).ReplaceAllString(line, `<u>1ドル</u>`)
 // convert links to anchor tags
 line = regexp.MustCompile(`\[(.*?)\]\((.*?)\)[^\)]`).ReplaceAllString(line, `<a href="2ドル">1ドル</a>`)
 // escape and wrap blockquotes in "<blockquote>" tags
 line = regexp.MustCompile(`^\>(\s|)`).ReplaceAllString(line, `&gt;`)
 line = regexp.MustCompile(`\&gt\;(.*?)$`).ReplaceAllString(line, `<blockquote>1ドル</blockquote>`)
 // wrap the content of backticks inside of "<code>" tags
 line = regexp.MustCompile("`(.*?)`").ReplaceAllString(line, `<code>1ドル</code>`)
 // convert headings
 for i := 6; i >= 1; i-- {
 size, md_header := strconv.Itoa(i), strings.Repeat("#", i)
 line = regexp.MustCompile(`^` + md_header + `(\s|)(.*?)$`).ReplaceAllString(line, `<h` + size + `>2ドル</h` + size + `>`)
 }
 // update the line
 lines[i] = line
 }
 // return the joined lines
 return strings.Join(lines, "\n")
}
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Dec 27, 2018 at 12:56
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Performance

Regex

regex.MustCompile() is very expensive! Do not use this method inside a loop !

instead, define your regex as global variables only once:

var (
 boldItalicReg = regexp.MustCompile(`\*\*\*(.*?)\*\*\*`)
 boldReg = regexp.MustCompile(`\*\*(.*?)\*\*`)
 ...
)

Headers

If a line is a header, it will start by a #. We can check for this before calling ReplaceAllString() 6 times ! All we need to do is to trim the line, and then check if it starts with #:

line = strings.TrimSpace(line)
if strings.HasPrefix(line, "#") {
 // convert headings
 ...
}

We could go further and unrolling the loop to avoid unecessary allocations:

count := strings.Count(line, "#")
switch count {
case 1:
 line = h1Reg.ReplaceAllString(line, `<h1>2ドル</h1>`)
case 2: 
 ...
}

Use a scanner

The idiomatic way to read a file line by line in go is to use a scanner. It takes an io.Reader as parameters, so you can directly pass your mardown file instead of converting it into a string first:

func NewMarkdown(input io.Reader) string {
 scanner := bufio.NewScanner(input)
 for scanner.Scan() {
 line := scanner.Text()
 ...
 }
}

Use []byte instead of string

In go, a string is a read-only slice of bytes. Working with strings is usually more expensive than working with slice of bytes, so use []byte instead of strings when you can:

line := scanner.Bytes()
line = boldItalicReg.ReplaceAll(line, []byte(`<b><i>1ドル</i></b>`))

Write result to a bytes.Buffer

Instead of string.Join(), we can use a buffer to write each line in order to further reduce the number of allocations:

buf := bytes.NewBuffer(nil)
scanner := bufio.NewScanner(input)
for scanner.Scan() {
 line := scanner.Bytes()
 ...
 buf.Write(line)
 buf.WriteByte('\n')
}
return buf.String()

final code:

package parse
import (
 "bufio"
 "bytes"
 "io"
 "regexp"
)
var (
 boldItalicReg = regexp.MustCompile(`\*\*\*(.*?)\*\*\*`)
 boldReg = regexp.MustCompile(`\*\*(.*?)\*\*`)
 italicReg = regexp.MustCompile(`\*(.*?)\*`)
 strikeReg = regexp.MustCompile(`\~\~(.*?)\~\~`)
 underscoreReg = regexp.MustCompile(`__(.*?)__`)
 anchorReg = regexp.MustCompile(`\[(.*?)\]\((.*?)\)[^\)]`)
 escapeReg = regexp.MustCompile(`^\>(\s|)`)
 blockquoteReg = regexp.MustCompile(`\&gt\;(.*?)$`)
 backtipReg = regexp.MustCompile("`(.*?)`")
 h1Reg = regexp.MustCompile(`^#(\s|)(.*?)$`)
 h2Reg = regexp.MustCompile(`^##(\s|)(.*?)$`)
 h3Reg = regexp.MustCompile(`^###(\s|)(.*?)$`)
 h4Reg = regexp.MustCompile(`^####(\s|)(.*?)$`)
 h5Reg = regexp.MustCompile(`^#####(\s|)(.*?)$`)
 h6Reg = regexp.MustCompile(`^######(\s|)(.*?)$`)
)
func NewMarkdown(input io.Reader) string {
 buf := bytes.NewBuffer(nil)
 scanner := bufio.NewScanner(input)
 for scanner.Scan() {
 line := bytes.TrimSpace(scanner.Bytes())
 if len(line) == 0 {
 buf.WriteByte('\n')
 continue
 }
 // wrap bold and italic text in "<b>" and "<i>" elements
 line = boldItalicReg.ReplaceAll(line, []byte(`<b><i>1ドル</i></b>`))
 line = boldReg.ReplaceAll(line, []byte(`<b>1ドル</b>`))
 line = italicReg.ReplaceAll(line, []byte(`<i>1ドル</i>`))
 // wrap strikethrough text in "<s>" tags
 line = strikeReg.ReplaceAll(line, []byte(`<s>1ドル</s>`))
 // wrap underscored text in "<u>" tags
 line = underscoreReg.ReplaceAll(line, []byte(`<u>1ドル</u>`))
 // convert links to anchor tags
 line = anchorReg.ReplaceAll(line, []byte(`<a href="2ドル">1ドル</a>`))
 // escape and wrap blockquotes in "<blockquote>" tags
 line = escapeReg.ReplaceAll(line, []byte(`&gt;`))
 line = blockquoteReg.ReplaceAll(line, []byte(`<blockquote>1ドル</blockquote>`))
 // wrap the content of backticks inside of "<code>" tags
 line = backtipReg.ReplaceAll(line, []byte(`<code>1ドル</code>`))
 // convert headings
 if line[0] == '#' {
 count := bytes.Count(line, []byte(`#`))
 switch count {
 case 1:
 line = h1Reg.ReplaceAll(line, []byte(`<h1>2ドル</h1>`))
 case 2:
 line = h2Reg.ReplaceAll(line, []byte(`<h2>2ドル</h2>`))
 case 3:
 line = h3Reg.ReplaceAll(line, []byte(`<h3>2ドル</h3>`))
 case 4:
 line = h4Reg.ReplaceAll(line, []byte(`<h4>2ドル</h4>`))
 case 5:
 line = h5Reg.ReplaceAll(line, []byte(`<h5>2ドル</h5>`))
 case 6:
 line = h6Reg.ReplaceAll(line, []byte(`<h6>2ドル</h6>`))
 }
 }
 buf.Write(line)
 buf.WriteByte('\n')
 }
 return buf.String()
}

Benchmarks

I used the folowing code for benchmarks, on a 20kB md file:

func BenchmarkMarkdown(b *testing.B) {
 md, err := ioutil.ReadFile("README.md")
 if err != nil {
 b.Fail()
 }
 raw := string(md)
 b.ResetTimer()
 for n := 0; n < b.N; n++ {
 _ = Markdown(raw)
 }
}
func BenchmarkMarkdownNew(b *testing.B) {
 for n := 0; n < b.N; n++ {
 file, err := os.Open("README.md")
 if err != nil {
 b.Fail()
 }
 _ = NewMarkdown(file)
 file.Close()
 }
}

Results:

> go test -bench=. -benchmem
goos: linux
goarch: amd64
BenchmarkMarkdown-4 10 104990431 ns/op 364617427 B/op 493813 allocs/op
BenchmarkMarkdownNew-4 1000 1464745 ns/op 379376 B/op 11085 allocs/op

benchstat diff:

name old time/op new time/op delta
Markdown-4 105ms ± 0% 1ms ± 0% ~ (p=1.000 n=1+1)
name old alloc/op new alloc/op delta
Markdown-4 365MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1)
name old allocs/op new allocs/op delta
Markdown-4 494k ± 0% 11k ± 0% ~ (p=1.000 n=1+1)
answered Dec 28, 2018 at 10:48
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.