Syntax-Highlighting from a Markdown source in Go using Chroma

In a previous post I described how to create syntax-highlighted HTML from a markdown source using Go. The code for the post can be found here.

However, I recently wrote a few posts about kotlin and plan to write some stuff about rust and maybe c in the future and the library I used for syntax-highlighting so far has very limited support in regards to different languages.

Luckily, there is now a Go library based on the fantastic pygments python syntax highlighting lib. That library is called chroma and this post will show an example of how to create syntax-highlighted HTML from a markdown source using chroma.

Chroma is quite powerful. It provides a plethora of different languages and styles to format the code in just the way you want it. It’s also straightforward to use, so working with it has been a pleasure.

For the markdown-to-html conversion we will again use blackfriday.

So let’s get started!

Implementation

The basic structure of the implementation stays the same. We load a markdown file and convert it to HTML. Then we will search for the parts containing code and replace them with the highlighted code, which will be generated using chroma.

The template we will render the code to is the following:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
    <head>
        <title>Syntax Highlighting from Markdown with Chroma</title>
        {{.Style}}
    </head>
    <body>
        {{.Content}}
    </body>
</html>

Content is where the actual markdown converted to HTML will be rendered and Style is where we will add the CSS generated by chroma for the style we want to use in this example.

But before we get to that, we need to do some steps first, like loading the markdown file:

func main() {
    // load markdown file
    mdFile, err := ioutil.ReadFile("./example.md")
    if err != nil {
        log.Fatal(err)
    }

Parsing the template file:

t, err := template.ParseFiles("./template.html")
if err != nil {
    log.Fatal(err)
}

And converting the markdown to HTML:

// convert markdown to html
html := blackfriday.MarkdownCommon(mdFile)

Now we encounter the first change in the implementation related to chroma. With the old syntax highlighting, I manually copied in the CSS for the style I wanted. Chroma has functionality for generating this dynamically built-in:

// write css
hlbuf := bytes.Buffer{}
hlw := bufio.NewWriter(&hlbuf)
formatter := html.New(html.WithClasses())
if err := formatter.WriteCSS(hlw, styles.MonokaiLight); err != nil {
    log.Fatal(err)
}
hlw.Flush()

This snippet creates the CSS for the style we want to use in this example called MonokaiLight. The content of the hlbuf will be written to the Style variable in the template later on.

Alright, so now we get to the actual syntax highlighting part. The idea is to find code-parts of the form:

<pre><code class="language-go">
    ...some Code...
</code></pre>

Once we find such a code-part using the goquery library, we parse out the language used from the class and try to syntax-highlight the code inside the <code> tags, replacing the old content with the new, highlighted content.

We create a replaceCodeParts function, which takes the converted HTML and returns a string containing the HTML with highlighted code parts.

First, we read in the converted HTML and create a goquery document from it, which we can use to search for code-parts:

func replaceCodeParts(mdFile []byte) (string, error) {
    byteReader := bytes.NewReader(mdFile)
    doc, err := goquery.NewDocumentFromReader(byteReader)
    if err != nil {
        return "", err
    }

Then we use a goquery Selector to find the code-parts we are interested in:

    // find code-parts via selector and replace them with highlighted versions
    doc.Find("code[class*=\"language-\"]").Each(func(i int, s *goquery.Selection) {
        ...
    })

Now comes the actual highlighting code. First, we parse the language to use and select the correct lexer. Keep in mind that I omitted any error-handling code here, but almost all of the following stops can fail and need to be handled accordingly. The code-example on GitHub has proper error handling included.

class, _ := s.Attr("class")
lang := strings.TrimPrefix(class, "language-")
lexer := lexers.Get(lang)

Now we have the correct lexer, which is necessary so our code is tokenized correctly. Next up, we do just that, we grab the code from the Selector and tokenize it:

oldCode := s.Text()
iterator, _ := lexer.Tokenise(nil, string(oldCode))

Now, all that’s left is to instantiate a formatter - in our case, we want to output html, but chroma provides other options as well. The formatter is the part of chroma, which actually generates the highlighted output, based on the code input and the used lexer.

formatter := html.New(html.WithClasses())
b := bytes.Buffer{}
buf := bufio.NewWriter(&b)
formatter.Format(buf, styles.GitHub, iterator)
buf.Flush()
s.SetHtml(b.String())

The above snippet creates the HTML formatter with the WithClasses option, which means that we don’t want to have inline-CSS, but rather want to use classes. This also means, that we need to include the CSS somewhere (which we did in the beginning of this example already). Then we format the code and write it to our buffer.

Once that is done, the content of the buffer is written to the Selector, thus replacing the previous content with our new, syntax-highlighted code.

After replacing the code, what’s left is to create a new HTML document to return it to the caller:

    new, err := doc.Html()
    if err != nil {
        return "", err
    }
    return new, nil
}

Ok, all we have to do now is to call the function and create the output HTML in the main function:

// replace code-parts with syntax-highlighted parts
replaced, err := replaceCodeParts(htmlSrc)
if err != nil {
    log.Fatal(err)
}
// write html output
if err := t.Execute(os.Stdout, struct {
    Content template.HTML
    CSS   template.CSS
}{
    Content: template.HTML(replaced),
    Style:   template.CSS("<style>" + hlbuf.String() + "</style>"),
}); err != nil {
    log.Fatal(err)
}

Nothing fancy happening here - we call the function with the HTML input and execute our template with the above created CSS and our new HTML.

That’s it. You can find the full code here.

Conclusion

The chroma library is fantastic. Back when I created the first implementation of this, I also contemplated just biting the bullet and use pygments, accepting the python-dependency for my blog-generator, but decided against it despite the limitations of the old implementation.

I’m very happy there is now a native Go option to do full-featured syntax-highlighting and if you’re reading this post, you already see the chroma version of the blog’s syntax-highlighting in action. :)

Resources