How are popular open source projects documented?
This is a survey of the state of the art in developer documentation - how the most popular open source tools manage their documentation.
I'm currently updating the documentation of an open source tool, and I thought I'd start with surveying the prior art of how successful projects are doing documentation today.
I'm particularly interested in how these projects invite contributions, comments, bugs, and handle versioning and scaling down to mobile UIs.
I thought I'd write this up in case it's useful to others. Our projects are typically not this popular, so the tradeoffs the popular projects make sometimes won't be right for us. Their solutions might be too heavyweight. But they've certainly been successful at growing a developer audience, and probably hit a lot of the pitfalls and iterated a few times, so it's useful to see how they solve documentation.
We'll be investigating these projects' docs, somewhat arbitrarily chosen as the top most starred projects on GitHub:
- TensorFlow
- Visual Studio Code
- Ansible
- Vue.js
- React
- Angular
- Keras
- Flutter
- Kubernetes
- Home Assistant
This methodology misses important projects that aren't hosted on GitHub, e.g. Python's Docs, or the Linux Kernel's Docs, but the point of this is to take a quick sample, not be exhaustive/representative.
Finally, remember the technology used to build docs is perhaps the least important part of the docs: the words that are written, and the information hierarchy presented to the user, are far more important than whether the site generator uses Go or Ruby. That said, let's dive in!
TensorFlow
TensorFlow, Google's popular machine learning framework, has 145k stars on GitHub.
Example Page: https://www.tensorflow.org/api_docs/python/tf/concat
A fully-custom generate2.py
script generates the TensorFlow docs from Markdown files in the TensorFlow Documentation Git repo, and Python Doc Comments, and Javadoc in the main TensorFlow repo. There is a Contributor Guidelines page to walk people through how to do this.
TensorFlow has no "Edit this page" button to take you to the source of the page, though there is a link to the Issue Tracker.
There is different API documentation per-version.
There is a survey: "Is this page helpful" allowing one-click voting from 1-5 stars.
Visual Studio Code
VS Code, Microsoft's popular code editor, has 98k stars on GitHub.
Example Page: https://code.visualstudio.com/api/extension-capabilities/theming
VS Code's docs include an "Edit this Page" link, which take you to the source Markdown file in GitHub. The Markdown has front-matter, like used by Jekyll and Hugo static site generators.
VS Code builds their docs with Gulp build tool, which invokes a scripts/build.sh
which doesn't appear to be present in the repository – probably a Microsoft-internal file. So I'm not sure if contributors can run the site locally! Microsoft's guidance is, naturally, to edit the Markdown file using Visual Studio Code.
There is a Contributor Guide, and a survey on each page:
Ansible
Ansible, IBM's popular IT automation tool, has 43k stars on GitHub.
Example Page: https://docs.ansible.com/ansible/latest/modules/copy_module.html
Ansible has extremely good documentation, in that I can usually copy/paste from their examples to solve my problems. I wish all sites had examples as good as Ansible's.
Ansible has edit links at the bottom of each doc page. Some of the links 404, I've filed an issue about that.
There isn't a button to file a documentation bug – perhaps the doc bugs are low-quality? And there's no survey either.
Ansible uses the popular Sphinx Doc system to generate documentation, also used by Python and the Linux Kernel. Sphinx has docs written in ReStructured Text .rst
files, which is like Markdown but less popular.
Ansible's docs are part of the main Ansible git repository, not a separate repository. This makes it easier to update the code at the same time as the docs.
The docs are versioned, with a dropdown to select older versions.
Vue.js
Vue.js, a popular JavaScript user interface library, has 166k stars on GitHub.
Example Page: https://vuejs.org/v2/guide/conditional.html#v-else-if
Props to the Vue team for the #BlackLivesMatter banner!
Vue has a edit button at the bottom of each page, which takes you to Markdown source with YAML front-matter.
The docs are in a separate Git repository from the main vuejs project. The README.md
indicates the site is built with hexo
, a Node.JS-based static site generator I hadn't heard of before, and deployed automatically on merge using Netlify. I suppose it makes sense for a JS project to use a JS-based static site generator, so contributors don't have to learn another language ecosystem.
Translations exist, and are organised as forks of the original docs repository. That would make it difficult to keep them up to date, but does optimise for ease of starting a translation.
There are no surveys or links to file a bug on the documentation page. Versioned docs are accessible:
React
React, Facebook's popular web user interface library, has 150k stars on GitHub.
Example Page: https://reactjs.org/docs/forwarding-refs.html
Each React doc page has an "Edit this page" link at the bottom, which takes you to a Markdown file on GitHub with YAML Front-Matter. The docs are stored in a reactjs.org
repository, separate from the main React code.
The README.md
indicates that you can run locally with yarn dev
which uses the Gatsby JavaScript-based static site generator. Gatsby is built on React, so perhaps it makes sense.
Docs are available for some old versions of React, but strangely not all. There's no link to file a bug about the documentation, and no survey.
Angular
Angular, Google's popular user interface framework, has 62k stars on GitHub.
Example Page: https://angular.io/guide/user-input
Angular has a very subtle 'Pencil' icon at the top of each page, which takes you to a Markdown file in the main angular
GitHub repository. Angular's docs build with yarn
using dgeni, a custom doc generator used by Angular and Protractor projects.
Judging by the presence of a firebase.json
file, the docs are probably deployed to Firebase.
I'm not seeing any survey on the page.
Keras
Keras, a popular high-level library for deep learning, has 49k stars on GitHub.
Example Page: https://keras.io/api/layers/initializers/
Keras has no "Edit this page" link. This would be a neat opportunity for someone to add one, for an extremely popular library.
Keras' docs live in a subdirectory of the main keras
project, as Markdown files built by the MkDocs static site generator.
Flutter
Flutter, Google's popular UI Toolkit for mobile, web, and desktop, has 94k stars on GitHub.
Example Page: https://flutter.dev/docs/cookbook/design/drawer
Flutter has a subtle 'source code' button that takes you to Markdown files with YAML front-matter hosted on a flutter/website
repository on GitHub, separate from the main Flutter code.
The README.md
indicates the site is built with the Jekyll static site generator.
Flutter also has a "bug" link to create an issue for their documentation, which pre-fills information about the doc page.
Next/previous links exist, unlike many of the other doc sites.
No survey is presented.
Kubernetes
Kubernetes, a popular container orchestration system, has 67k stars on GitHub.
Example Page: https://kubernetes.io/docs/concepts/#kubernetes-objects
Kubernetes docs have a strong statement about not tolerating racism in their community. I appreciate it.
Kubernetes docs have a big edit button at the top. At the bottom they have an extremely comprehensive set of feedback loops: a Yes/No helpfulness survey, Edit This Page, and Create an Issue, and even last-modified-time for the page with a link to the latest commit! I particularly appreciate the last-modified-time, as it can be a signal to whether the page is up to date.
Kubernetes docs are hosted out of the kubernetes/website
repository, separate from the main kubernetes/kubernetes
repository.
All translations are part of the same repository, but they don't seem to be updated at the same time.
The docs are Markdown with YAML front matter. The README.md
says the site is built using the Hugo Go-based static site generator. Kubernetes is also written in Go: again, we see a project choosing a static site generator written in the same language as the rest of the project.
Interestingly, it looks like the site was converted from Jekyll to Hugo two years ago, in a +123k line diff. That looks like a big job, though I hear Hugo has an automated converter. Looks like the rationale was: Hugo offers better multi-language support and faster build times.
We chose Hugo after months of research and conversations with other open source translation projects. [...] Hugo's multilingual support is built in and easy.
Another advantage of Hugo is that build performance scales well at size. At 250+ pages, the Kubernetes site's build times suffered significantly with Jekyll. We're excited about removing the barrier to contribution created by slow site build times.
It's always interesting to see people migrate from one platform to another, and their reasons for it! It wouldn't have been easy to migrate.
Kubernetes has a Contributors Guide for documentation. Continuous Deployment seems to be handled by Netlify.
Home Assistant
Home Assistant, a popular smart home automation framework, has 33k stars on GitHub.
Example Page: https://www.home-assistant.io/docs/
Home Assistant has a prominent "Edit this page on GitHub" link, which leads to Markdown with YAML front-matter in a documentation Git repository separate from the main Home Assistant code.
The README.md
indicates the site uses bundle exec rake preview
to see the site, and the Rakefile
calls jekyll build
. So this is a Jekyll-based website. Ruby-based Jekyll is the most popular static site generator.
Continuous Deploys are handled by Netlify. There's no survey on the docs page, or direct link to file issues about the page, but there is a "Need Help?" page which links to the GitHub Issues page for the docs.
Notably, Home Assistant separates User Docs from Developer Docs. I think this is a good pattern to segment your audience.
Survey Conclusions
This has been a short tour of how the most popular open source projects manage their documentation. What have we learned?
- Everyone's checking their code into version control as their source of truth. Nobody is using a database-backed Content Management System like Wordpress for their docs.
- Everyone's using a static site generator, and it doesn't really matter if it's custom or a standard one. Jekyll is most popular, but a lot of projects opt for a static site generator written in the same language as their library.
- Markdown remains extremely popular as a doc format, with all projects using Markdown except for Ansible, which uses ReStructured Text as required by Sphinx. None of the projects surveyed are using raw HTML for doc content, nor Wiki Markup, BBCode, AsciiDoc, or LaTeX. Of course there are different flavours of Markdown though...
- Static site generators seem to have mostly converged on "YAML front matter" as being the way to configure pages.
- "Edit This Page" buttons are popular to try to convert users into documentation contributors.
- "File a bug about this page" is not very popular – perhaps the bugs are low-quality and triaging them is a pain?
- Some teams are putting feedback surveys on their docs, but only some of the really big corporate-backed players who can pay people to do the analysis.
- Netlify is a popular way to do Continuous Deployment and hosting.
- Almost all sites are caring about responsive design, scaling down to mobile-phone size. I wonder how many people are reading API docs on their phone. It's probably not a trivial number! I'd love to see some audience numbers on this.
- Some projects keep their docs in the same git repository, and many don't. A best practice hasn't emerged here.
- Many projects allow you to view API docs for old versions, though this is not consistently supported.
- Nobody seems to have comments on their docs any more. Comments used to be very popular. PHP's user-contributed comments on docs were often famous for being more useful than the docs themselves. Perhaps nobody wants to moderate comments any more? Maybe everyone had Disqus comments then Disqus started putting ads on their page?
- Everyone is doing translations differently: some not at all, some as forks of the main repo, some as subdirectories.
Further Work / Questions not answered here
- How do these sites handle search? Static Site Generators don't typically output a search index.
- Do different patterns emerge for smaller, or less popular open source projects?
- What about the projects that aren't on GitHub, for whatever reason? Perhaps they're too big, or older, or just prefer being hosted elsewhere?
- If documentation is in a separate repository from the main code, how are they kept in sync?
- What do professional Tech Writers think are the best practices?
- Is there a way we could standardise these patterns, so every project doesn't have to reinvent best practices? What would that look like?
- What's the next big thing in documentation, that all of these sites are missing?
Comments ()