Dogfooding with Jekyll
Using the new `data_source` configuration to serve mankind
Yesterday, I learned that Jekyll, the well-known powerful static-site generator, has a little-known feature that is kind of a big deal for open-data sites hosted on Github.
tl;dr: Jekyll can let you consume and publish data files with the data_source
configuration setting
Working with _data
Just a few weeks ago, I was complaining to some friends about the _data
folder in Jekyll. You see, the idea of using simple, flat data files to power a website is a smart thing to do. And while Jekyll makes it easy to consume data from YAML, JSON, and CSV files, because Jekyll ignores folders with a leading underscore (and therefore _data
is not published), Jekyll made it nearly impossible to publish the data.
For those of us who like to share our data and use Github, this meant one of two unattractive options, neither of which really worked well:
- Publish the data twice, once in the
_data
folder and once in a separatedata
folder; or - Publish the data in a separate git repository, and use it a submodule.
The obvious alternative – symlinks – doesn’t work either; because Github uses the --safe
flag when publishing the site, symlinks are not an option for sites hosted on Github Pages.
This was an unfortunate state of affairs. Despite demands for data publishers to eat our own dogfood, Github Pages and Jekyll could not deliver… But that was then. The future of _data
is here!
data_source
to the rescue!
It turns turns out, the _data
folder is just the default location for data files in Jekyll. In your _config.yml
file, you can set a different location for your data folder using the data_source
configuration setting.
So, let’s say you want to publish your data in a folder called data
. To accomplish this, you simply need to rename the _data
folder to be data
, and add this line to your _config.yml
file:
data_source: data
That’s it. Now the data
folder is used by Jekyll to power your site and is published for the world to see at a url endpoint /data
! You have now opened up your dataset. That easy. (If you want to see it in action, check out this basic repository at https://github.com/vzvenyach/jekyll-data-test/tree/gh-pages with a demo published here: http://code.esq.io/jekyll-data-test).
Bon appetit!