Writing a Static Blog Generator in Go

A static-site-generator is a tool which, given some input (e.g. markdown) generates a fully static website using HTML, CSS and JavaScript.

Why is this cool? Well, for one it’s a lot easier to host a static site and it’s also (usually) quite a bit faster and resource-friendly. Static sites aren’t the best choice for all use-cases, but for mostly non-interactive websites such as blogs they are great.

In this post, I will describe the static blog generator I wrote in Go, which powers this blog.

Motivation

You might be familiar with static-site-generators like the great Hugo, which has just about all the features one could hope for in regards to static site generation.

So why would I write another tool just like it with fewer capabilities? The reason was twofold.

One reason was that I wanted to dive deeper into Go and a command-line-based static-site-generator seemed like a great playground to hone my skills.

The second reason was simply that I had never done it before. I’ve done my fair share of web-development, but I never created a static-site-generator.

This made it intriguing because I theoretically had all the prerequisites and skills to build such a tool with my background in web-development, but I had never tried doing it.

I was hooked, implemented it within about 2 weeks and had a great time doing it. I used my blog-generator for creating this blog and it worked great so far. :)

Concept

Early on, I decided to write my blog posts in markdown and keep them in a GitHub Repo. The posts are structured in folders, which represent the url of the blog post.

For metadata, such as publication date, tags, title and subtitle I decided on keeping a meta.yml file with each post.md file with the following format:

title: Playing Around with BoltDB 
short: "Looking for a simple key/value store for your Go applications? Look no further!"
date: 20.04.2017
tags:
    - golang
    - go
    - boltdb
    - bolt

This allowed me to separate the content from the metadata, but still keep everything in the same place, where I’d find it later.

The GitHub Repo was my data source. The next step was to think about features and I came up with this List:

Quite a healthy feature set. What was very important for me from the start, was to keep everything simple, fast and clean - without any third-party trackers or ads, which compromise privacy and slow everything down.

Based on these ideas, I started making a rough plan of the architecture and started coding.

Architectural Overview

The application is simple enough. The high-level elements are:

The CLI in this case is very simple, as I didn’t add any features in terms of configurability. It basically just fetches data from the DataSource and runs the Generators on it.

The DataSource interface looks like this:

type DataSource interface {
    Fetch(from, to string) ([]string, error)
}

The Generator interface looks like this:

type Generator interface {
    Generate() error
}

Pretty simple. Each Generator also receives a configuration struct, which contains all the necessary data for generation.

There are 7 Generators at the time of writing this post:

Where the SiteGenerator is the meta-generator, which calls all other generators and outputs the whole static website.

The generation is based on HTML templates using Go’s html/template package.

Implementation Details

In this section I will just cover a few selected parts I think might be interesting, such as the git DataSource and the different Generators.

DataSource

First up, we need some data to generate our blog from. This data, as mentioned above has the form of a git repository. The following Fetch function captures most of what the DataSource implementation does:

func (ds *GitDataSource) Fetch(from, to string) ([]string, error) {
    fmt.Printf("Fetching data from %s into %s...\n", from, to)
    if err := createFolderIfNotExist(to); err != nil {
        return nil, err
    }
    if err := clearFolder(to); err != nil {
        return nil, err
    }
    if err := cloneRepo(to, from); err != nil {
        return nil, err
    }
    dirs, err := getContentFolders(to)
    if err != nil {
        return nil, err
    }
    fmt.Print("Fetching complete.\n")
    return dirs, nil
}

Fetch is called with two parameters from, which is a repository URL and to, which is the destination folder. The function creates and clears the destination folder, clones the repository using os/exec plus a git command and finally reads the folder, returning a list of paths for all the files within the repository.

As mentioned above, the repository contains only folders, which represent the different blog posts. The array with these folder paths is then passed to the generators, which can then do their thing for each of the blog posts within the repository.

Kicking it all off

After the Fetch comes the Generate phase. When the blog-generator is executed, the following code is executed on the highest level:

ds := datasource.New()
dirs, err := ds.Fetch(RepoURL, TmpFolder)
if err != nil {
    log.Fatal(err)
}
g := generator.New(&generator.SiteConfig{
    Sources:     dirs,
    Destination: DestFolder,
})
err = g.Generate()
if err != nil {
    log.Fatal(err)
}

The generator.New function creates a new SiteGenerator which is basically a generator, which calls other generators. It’s passed a destination folder and the directories for the blog posts within the repository.

As every Generator implementing the interface mentioned above, the SiteGenerator has a Generate method, which returns an error. The Generate method of the SiteGenerator prepares the destination folder, reads in templates, prepares data structures for the blog posts, registers the other generators and concurrently runs them.

The SiteGenerator also registers some settings for the blog like the URL, Language, Date Format etc. These settings are simply global constants, which is certainly not the prettiest solution or the most scalable, but it’s simple and that was the highest goal here.

Posts

The most important concept on a blog are - surprise, surprise - blog posts! In the context of this blog-generator, they are represented by the following data-structure:

type Post struct {
    Name      string
    HTML      []byte
    Meta      *Meta
    ImagesDir string
    Images    []string
}

These posts are created by iterating over the folders in the repository, reading the meta.yml file, converting the post.md file to HTML and by adding images, if there are any.

Quite a bit of work, but once we have the posts represented as a data structure, the generation of posts is quite simple and looks like this:

func (g *PostGenerator) Generate() error {
    post := g.Config.Post
    destination := g.Config.Destination
    t := g.Config.Template
    staticPath := fmt.Sprintf("%s%s", destination, post.Name)
    if err := os.Mkdir(staticPath, os.ModePerm); err != nil {
      return fmt.Errorf("error creating directory at %s: %v", staticPath, err)
    }
    if post.ImagesDir != "" {
      if err := copyImagesDir(post.ImagesDir, staticPath); err != nil {
          return err
      }
    }
    if err := writeIndexHTML(staticPath, post.Meta.Title, template.HTML(string(post.HTML)), t); err != nil {
      return err
    }
    return nil
}

First, we create a directory for the post, then we copy the images in there and finally create the post’s index.html file using templating. The PostGenerator also implements syntax-highlighting, which I described in this post.

Listing Creation

When a user comes to the landing page of the blog, she sees the latest posts with information like the reading time of the article and a short description. For this feature and for the archive, I implemented the ListingGenerator, which takes the following config:

type ListingConfig struct {
    Posts                  []*Post
    Template               *template.Template
    Destination, PageTitle string
}

The Generate method of this generator iterates over the post, assembles their metadata and creates short blocks based on the given template. Then these blocks are appended and written to the index template.

I liked medium’s feature to approximate the time to read an article, so I implemented my own version of it, based on the assumption that an average human reads about 200 words per minute. Images also count towards the overall reading time with a constant 12 seconds added for each img tag in the post. This will obviously not scale for arbitrary content, but should be a fine approximation for my usual articles:

func calculateTimeToRead(input string) string {
    // an average human reads about 200 wpm
    var secondsPerWord = 60.0 / 200.0
    // multiply with the amount of words
    words := secondsPerWord * float64(len(strings.Split(input, " ")))
    // add 12 seconds for each image
    images := 12.0 * strings.Count(input, "<img")
    result := (words + float64(images)) / 60.0
    if result < 1.0 {
        result = 1.0
    }
    return fmt.Sprintf("%.0fm", result)
}

Tags

Next, to have a way to categorize and filter the posts by topic, I opted to implement a simple tagging mechanism. Posts have a list of tags in their meta.yml file. These tags should be listed on a separate Tags Page and upon clicking on a tag, the user is supposed to see a listing of posts with the selected tag.

First up, we create a map from tag to Post:

func createTagPostsMap(posts []*Post) map[string][]*Post {
result := make(map[string][]*Post)
    for _, post := range posts {
        for _, tag := range post.Meta.Tags {
            key := strings.ToLower(tag)
             if result[key] == nil {
                 result[key] = []*Post{post}
             } else {
                 result[key] = append(result[key], post)
             }
        }
    }
    return result
}

Then, there are two tasks to implement:

The data structure of a Tag looks like this:

type Tag struct {
    Name  string
    Link  string
    Count int
}

So, we have the actual tag (Name), the Link to the tag’s listing page and the amount of posts with this tag. These tags are created from the tagPostsMap and then sorted by Count descending:

tags := []*Tag{}
for tag, posts := range tagPostsMap {
    tags = append(tags, &Tag{Name: tag, Link: getTagLink(tag), Count: len(posts)})
}
sort.Sort(ByCountDesc(tags))

The Tags Page basically just consists of this list rendered into the tags/index.html file.

The List of Posts for a selected Tag can be achieved using the ListingGenerator described above. We just need to iterate the tags, create a folder for each tag, select the posts to display and generate a listing for them.

Sitemap & RSS

To improve searchability on the web, it’s a good idea to have a sitemap.xml which can be crawled by bots. Creating such a file is fairly straightforward and can be done using the Go standard library.

In this tool, however, I opted to use the great etree library, which provides a nice API for creating and reading XML.

The SitemapGenerator uses this config:

type SitemapConfig struct {
    Posts       []*Post
    TagPostsMap map[string][]*Post
    Destination string
}

blog-generator takes a basic approach to the sitemap and just generates url and image locations by using the addURL function:

func addURL(element *etree.Element, location string, images []string) {
    url := element.CreateElement("url")
     loc := url.CreateElement("loc")
     loc.SetText(fmt.Sprintf("%s/%s/", blogURL, location))

     if len(images) > 0 {
         for _, image := range images {
            img := url.CreateElement("image:image")
             imgLoc := img.CreateElement("image:loc")
             imgLoc.SetText(fmt.Sprintf("%s/%s/images/%s", blogURL, location, image))
         }
     }
}

After creating the XML document with etree, it’s just saved to a file and stored in the output folder.

RSS generation works the same way - iterate all posts and create XML entries for each post, then write to index.xml.

Handling Statics

The last concept I needed were entirely static assets like a favicon.ico or a static page like About. To do this, the tool runs the StaticsGenerator with this config:

type StaticsConfig struct {
    FileToDestination map[string]string
    TemplateToFile    map[string]string
    Template          *template.Template
}

The FileToDestination-map represents static files like images or the robots.txt and TemplateToFile is a mapping from templates in the static folder to their designated output path.

This configuration could look like this in practice:

fileToDestination := map[string]string{
    "static/favicon.ico": fmt.Sprintf("%s/favicon.ico", destination),
    "static/robots.txt":  fmt.Sprintf("%s/robots.txt", destination),
    "static/about.png":   fmt.Sprintf("%s/about.png", destination),
}
templateToFile := map[string]string{
    "static/about.html": fmt.Sprintf("%s/about/index.html", destination),
}
statg := StaticsGenerator{&StaticsConfig{
FileToDestination: fileToDestination,
   TemplateToFile:    templateToFile,
   Template:          t,
}}

The code for generating these statics is not particularly interesting - as you can imagine, the files are just iterated and copied to the given destination.

Parallel Execution

For blog-generator to be fast, the generators are all run in parallel. For this purpose, they all follow the Generator interface - this way we can put them all inside a slice and concurrently call Generate for all of them.

The generators all work independently of one another and don’t use any global state mutation, so parallelizing them was a simple exercise of using channels and a sync.WaitGroup like this:

func runTasks(posts []*Post, t *template.Template, destination string) error {
    var wg sync.WaitGroup
    finished := make(chan bool, 1)
    errors := make(chan error, 1)
    pool := make(chan struct{}, 50)
    generators := []Generator{}

    for _, post := range posts {
        pg := PostGenerator{&PostConfig{
            Post:        post,
             Destination: destination,
             Template:    t,
        }}
        generators = append(generators, &pg)
    }

    fg := ListingGenerator{&ListingConfig{
        Posts:       posts[:getNumOfPagesOnFrontpage(posts)],
        Template:    t,
        Destination: destination,
        PageTitle:   "",
    }}

    ...create the other generators...

    generators = append(generators, &fg, &ag, &tg, &sg, &rg, &statg)

    for _, generator := range generators {
        wg.Add(1)
        go func(g Generator) {
            defer wg.Done()
            pool <- struct{}{}
            defer func() { <-pool }()
            if err := g.Generate(); err != nil {
                errors <- err
            }
        }(generator)
    }

    go func() {
        wg.Wait()
        close(finished)
    }()

    select {
    case <-finished:
        return nil
    case err := <-errors:
        if err != nil {
           return err
        }
    }
    return nil
}

The runTasks function uses a pool of max. 50 goroutines, creates all generators, adds them to a slice and then runs them in parallel.

These examples were just a short dive into the basic concepts used to write a static-site generator in Go.

If you’re interested in the full implementation, you can find the code here.

Conclusion

Writing my blog-generator was an absolute blast and a great learning experience. It’s also quite satisfying to use my own hand-crafted tool for creating my blog.

To publish my posts to AWS, I also created static-aws-deploy, another Go command-line tool, which I covered in this post.

If you want to use the tool yourself, just fork the repo and change the configuration. However, I didn’t put much time into customizability or configurability, as Hugo provides all that and more.

Of course, one should strive not to re-invent the wheel all the time, but sometimes re-inventing a wheel or two can be rewarding and can help you learn quite a bit in the process. :)

Resources