Migrating Technical Docs from Jekyll to Hugo+Docsy
Recently, I migrated Graphviz'technical documentation from the Jekyll static site generator to the Hugo static site generator, and specifically the Docsy Hugo theme for technical documentation.
I thought it would be straightforward to move static site generators, but it turned out rather difficult, so perhaps it's worth writing about. Good technical doc infra is underrated. I hope this will be useful to write about for anyone considering a move from Jekyll to Hugo, or anyone interested in an evaluation of Docsy.
Improving the docs
Graphviz is an open source project for visualising nodes and edges connecting them. I regularly use Graphviz to diagram distributed system components at work. I get a lot of value out of Graphviz, but I felt it was hard to convince other people to use Graphviz because it's documentation was very hard to navigate. Tens of thousands of developers use Graphviz, so hopefully improving the docs will have some leverage.
Why migrate from Jekyll to Hugo?
Mostly, I wanted access to the Docsy technical documentation theme, which is written for Hugo.
Graphviz has only a handful of spare-time contributors, so we have time to spend on reinventing solved problems. I wanted to outsource the generic parts of technical documenation:
- Responsive, Mobile-friendly design
- A beautiful theme, particularly for code samples (the old Graphviz site had very little CSS)
- An information hierarchy (top-level tabs at the topbar, doc hierarchy on the left-nav, and jump-to-headings on the right-hand-nav)
- "Edit this page" links that drop you straight into an editor to make a pull request. These are critical for lowering the bar for 'drive-by' small fixes from anyone, and growing contributors.
- "File a docs issue with this page" links
- Anchor-links to page headings for easy sharing of specific sections
- Last-modified times for each page
And some features we're not using yet, but are interested in exploring:
- Page feedback -- "was this page useful / how could we improve it" surveys (we're not using this yet)
- Multi-language support
- Full-text site search indexing
- Printable docs
And Docsy is under active development: hopefully we will be able to benefit from yet more features in future.
In a previous post, I evaluated many open source documentation solutions.
I have experience at work with g3doc, Google's hosted internal technical documentation platform SRECon Talk, Slides. In my opinion, g3doc has successfully platformized the generic parts of technical documentation, enabling engineers to focus on the important part: writing content and making a legible information hierarchy.
Docsy was started by the g3doc team, who are trying to create an excellent open-source documentation platform.
Differences between Jekyll and Hugo
Jekyll:
- Ruby, with C extensions. Ruby code is typically slower than Go.
- Started 2008 by GitHub
- Really kicked off the static site generator revolution
- Liquid templating engine. Requires manual escaping -- constantly over- or under-escaping.
- Many features are provided via Ruby plugins. Relatively simple set of core concepts to understand.
- Unopinionated directory structure (put files wherever you want, mostly -- some special directories like
_layouts/
,_data/
) - Multi-level template inheritance (e.g. Gallery extends Docs extends Site, DownloadsPage extends Site)
- Can call functions from anywhere in markdown file with arbitrary Liquid syntax
- Data for templates in YAML/JSON
- Themes are typically Ruby Gems
- No concept of 'list pages'
Hugo:
- Go, with some node.js extensions (sass). Go code is typically faster than Ruby.
- Started 2013
- Go's
template/html
templating engine. Contextual auto-escaping. - No plugin model; almost all features are in "core Hugo". Sprawling list of concepts to understand
- Opinionated directory structure (
static/
,content/
,layouts/
). "Convention over configuration" like Rails. - One-level template inheritance
- Circumscribed function invocation from markdown files (shortcodes)
- Data for templates in YAML/JSON
- Themes are typically Git Submodules
- "list pages" for each section
I have been using Jekyll for a decade; this website (markhansen.co.nz) used to run on Jekyll. In my opinion, I found Jekyll quite easy to get started with. Hugo's documentation seems somehow both sprawling and under-specified.
I had a lot of trouble understanding Hugo's concepts, and sometimes the convention-over-configuration style of searching directories (rather than specifying explicitly which directory) confused me. Hugo's documentation could benefit from an ordered list of concepts that build on each other; it felt like Hugo's docs for concept A were constantly defining concept A in terms of other concepts B and C that I hadn't yet understood. Perhaps I'll write more about this in a future post; but for now my plate is fairly full improving Graphviz's docs. Particularly, the distinction between index.md
vs _index.md
, and layouts and shortcodes, and why shortcodes can't compose, really confused me. Perhaps this is also just my feeling of familiarity with Jekyll.
Docsy's docs were far clearer, and often filled in the gaps in Hugo's docs.
I was grateful for, and can heartily recommend, Brian P. Hogan's Hugo Book for anyone looking to learn Hugo.
Stability: Go and Ruby
It's important, for technial docs for widely-used projects, that any solution will keep working.
The previous docs solution ran for over 20 years -- you can still load the same HTML pages in your browser. I'd like any new solution to work for at least another decade.
I expect Go code to run just fine in 20 years, but I'm not as sure about Ruby code with many C extensions. Anecdotally, it's quite difficult setting up an environment for contributors to use Ruby (RVM, Bundler, platform-level Rubies out of date, library version conflicts, C compilers...).
Go's statically linked binaries and compatibility guarantees make it both much more stable and easier to get up and running with a single binary on nearly any platform.
Refactoring to make migration easier
The Graphviz site was had a few problems:
- Many of the most popular pages (e.g. attribute reference) were mostly entire HTML 2.0 pages.
- This made it harder to apply a common theme and navigation, vs (say) Markdown files.
- The docs had three copies, across the Graphviz docs repo, and two locations inside the Graphviz core repo.
- This made it cumbersome to update docs: requiring updating three locations.
- The docs were generated by
ksh
scripts run from Makefiles, often by emitting HTML fromksh
scripts inside for-loops.- This made it tricky to escape correctly, and difficult to re-work HTML output. Templates solve this problem better. And the build process had many manual steps, making it hard to integrate into a common theme, and get live reload working.
None of this is to disparage the previous docs: this architecture had served Graphviz well for 20+ years, and was very stable: I could still run the scripts and have them work on my 2019 Macbook. There was some ingenious work with templating HTML from text files.
My strategy was to move slowly, checking at each stage using diff
if I had broken anything unintentionally, even if that means doing the migration in multiple steps:
For static files:
- I centralised static files in one repository (the Graphviz docs repository), removing their
<head>
and<h1>
sections (keeping the body) so they can be generated by Jekyll's templates instead.` - Then deleted other copies in other repositories
- I migrated large HTML 2.0 pages to Markdown, both manually and using Turndown-VSCode's "Convert HTML to Markdown" functionality (again, checking with
diff
to find unintended changes in output)
For generated files:
- I migrated generated files from generating with
ksh
to generating using Python and Jinja2 templates, rather than inline emitting from for-loops, keeping them in-place in the Makefile. - Then I moved them to Jekyll, (changing the templating from Jinja2 to Liquid) and deleted the old Makefile rules. This centralised them as part of Jekyll's build process.
- For some generated files, I used Jekyll Data (JSON files fed into templates), for Graphviz's attributes docs page, which contains 200 attributes, I used Jekyll Collections (markdown-data with YAML front-matter metadata, which you can loop over in templates).
This was fairly painstaking work. Sometimes it was demotivating that I was just making changs to things that don't impact the user, but it felt critical to reduce the number of redundant copies of the doc, and get it all into a fast deploy cycle, before improving the text.
To diff the site, I would first compile the entire site at the base commit:
jekyll build -d _public_old
Then update to the new commit, rebuild and diff the old and new folders:
jekyll build -d _public_new
diff -r _public_old _public_new
I think this was a very powerful technique for moving carefully: first, do no harm!
These initial refactorings helped out users too:
- They bought many pages within the Jekyll site's theme (navigation and footer). In particular, the Gallery and attributes documentation got the same theme.
- Moving to markdown made it easier to edit the pages.
Actually migrating from Jekyll to Hugo
Hugo advertises having a hugo import jekyll
command, but it seems extremely limited. The Graphviz site had a bunch of layouts and liquid templates and data, and the Jekyll-to-Hugo migration tool only seems to consider the simplest types pages. So I migrated mostly manually.
Directory Structure
Hugo's directory structure and Jekyll's directory structure are totally different. This makes it very difficult to do an incremental in-place migration. I recommend making a subfolder for Hugo or Jekyll (or both) instead, so their directory structures are independent.
Static Files
In Jekyll, static files are output into the same folder structure as the git repository under public/
. It's very simple to tell where a file starts, and where it ends up: they're the same path!
In Hugo, static content lives under static/
. However, you can also put static files under content/
. And then it's (mostly) output into the same hierarchy under content/
.
Single Pages
Hugo takes markdown files under content/
and outputs them to the same location in the output. You can usually play around with the hierarchy to get them to work.
The only trouble might be pages named index.html
: content/section/index.md
gets output to public/section/index.html
as you might expect, and is accessible usually via http://mysite/section/
.
Layouts
Hugo has some convention-over-configuration thing where it takes the name of the folder under content/
and uses that as the name of the 'layout'.
I found this fairly inscrutable: when I have content/en/section/subsection/foo.md
, which layout will Hugo use, section
or subsection
? I ended up mostly explicitly specifying the layout using the layout
key in the YAML front-matter, and I suggest you do too.
Jekyll supports layout inheritance: e.g. you can have this entirely reasonable hierarchy:
- Gallery Image extends Gallery Base
- Gallery List extends Gallery Base
- Gallery Base extends Docs
- Attribute Docs extend Docs
- Docs extends Root
- Blogpost extends Root
Hugo only supports a single level of hierarchy per-section:
- Gallery Single (
single.html
) extends Gallery Base (baseof.html
). - Gallery List (
list.html
) extends Gallery Base (baseof.html
).
And if you want similar content in Gallery Base as you do in the Docs, you have to copy/paste it. This is a real shame. Reading between the lines on the Hugo site, this seems like it might have previously been a limitation of Go's template/html
engine that's since been fixed?
I still have to copy/paste the Docsy theme whenever I want to customise base layouts, which undercuts my ability to transparently upgrade to new Docsy features. This feels quite an arbitrary limitation, and I think it's an opportunity for Hugo.
Sections and Lists
In Jekyll, if you want a page listing other pages, you write a for-loop with Liquid syntax.
Hugo has the built-in concept of "Sections" which contain other "Pages" (e.g. Blog section contains blogpost pages, or Gallery section contains image pages). Sometimes these Sections are called Bundles. I don't really understand the difference.
Hugo has built-in support for , you have to use index.md
or _index.md
. The difference is not clearly explained on the Hugo site, and I ended up mostly guessing at one or the other until I got the output I desired. I still don't really understand the difference - I have read many forum posts and blogposts about these, and I'm still mystified. I suspect these are just poorly-defined concepts.
Redirects
In Jekyll, we used the jekyll-redirect-from
plugin extensively to keep old URLs working -- because (Cool URLs don't change)[https://www.w3.org/Provider/Style/URI].
jekyll-redirect-from
supports both redirects to a path, and redirects from a path. Both are important, for specifying old URLs for HTML files (in front matter), and also for redirecting non-HTML files to new locations.
Hugo has built-in support for redirects (they call them Aliases) to files, but not redirects from files. It's a missed opportunity. I had to make my own layout template for redirects and make files like old.pdf.html
refer to that layout. There is an opportunity here for Hugo to include redirects in core so users are able to redirect non-HTML content that can't have YAML front-matter (e.g. PDF manuals, images).
Data Files
Jekyll Data Files and Hugo Data Templates both allow checking in data files in YAML or JSON format and then making these available to templates. They map across quite nicely.
Collections
Jekyll Collections lets you define markdown files with front-matter that aren't directly output into the output folder, but are rather designed to be looped over in templates. Graphviz had a Collection for it's 200 attributes that were output in a single page.
Hugo has a concept of a []"Headless Page Bundle"](https://gohugo.io/content-management/page-bundles/#headless-bundle) when you set headless = true
in the front-matter. You can then grab the pages and loop over them.
I found I also had to suppress these pages from showing up in the table of contents using the toc_hide: true
front-matter parameter.
Hugo's cascade
feature let me apply parameters like toc_hide
to every file in a collection.
Information Hierarchy
Jekyll pretty much leaves you to build your own information hierarchy.
Hugo has a built-in concept of a hierarchy of page bundles, and built-in support for generating table of contents, navbars, and sitemaps. It's neat how each page can 'add itself' to a specific menu (top bar or left-nav): this makes big changes are decentralised, it's a neat bit of inversion-of-control pattern, and you don't have to manually keep the table of contents up to date.
Docsy builds on Hugo's menu concepts to provide top and side navigation bars. They're great.
Includes
Jekyll Includes lets you call code from any markdown file, and has a single template abstraction.
For example, on the Attributes page we call a for-loop in the middle of the page to generate a table of the attributes.
Hugo's includes story is confusing:
- Inside markdown files you are only allowed to call "shortcodes".
- Inside wrap-around-layouts you can include arbitrary other partial templates, but not shortcodes
- Shortcodes can include partial templates, but not other shortcodes (that is, shortcodes don't compose).
- Shortcode and partial template syntaxes are totally different, and have totally different ways to pass parameters.
I wish this was simpler. Cleaning this up is a big opportunity for Hugo.
My advice:
-
Use templates wherever possible.
-
Use shortcodes where you have to (calling functions from inside markdown files).
-
If you find you need to compose shortcodes, make the shortcode wrap a partial template, then compose the partial templates.
-
refer to the todo list document
Being careful
I gave up on incremental migrations for the Jekyll-Hugo migration proper. The themes were different enough that diff -r
wouldn't help much, and instead I:
- kept the content the same (no changes to markdown files)
- tried my best to retain git history by moving files then changing them (git offers little help here with its poor rename detection and nonexistent rename hint support)
- Manually reviewed each page, including looking at the HTML output.
Some things to check that you might not otherwise:
- favicons
<link rel=canonical>
tags- syntax highlighting
- file structure (you shouldn't have new or deleted files -- use
diff
on the output offind | sort
in before/after output directories). This will show up any missing redirect files quickly. - grep for Liquid Tags
{{
and{%
to see if you missed any - consider how you'll revert if needed
I still made some mistakes:
- Docsy's default theme outputs
noindex
unless you set theHUGO_ENV=prodution
environment variable, so the Graphviz site was de-indexed from Google for a few days until Google re-indexed. - Some links broke. Hugo's
ref
shortcode is helpful for making dead links break at compile-time rather than at runtime. - My initial demo builds were on macOS with a case-sensitive filesystem, breaking some old URLs that were case-sensitive. Building inside a docker container fixed this.
Conclusions
Overall, Docsy is a huge benefit for our users: the site is mobile-friendly, has a far better information hierarchy, looks better, and is easier to navigate and share links to specific sections.
It's easier for contributors, with mostly one source of truth for docs to change. It's easier to spin up a dev server: you install one binary and run it, rather than wrangling Ruby dependencies.
I expected Hugo to be faster than Jekyll, but it hasn't been that much different. Before Jekyll took perhaps 14 seconds to build, and now Hugo is building in about 11 seconds. Perhaps I'm doing something silly in my templates.
Hugo offers a bit more than Jekyll, at the cost of having a far more complex set of concepts to understand.
Overall I can recommend Docsy for people looking to pick up a great documentation platform.
Comments ()