Posts for the month of August 2017

Meet Jenkins: GHC's new CI and build infrastructure

While Phabricator is generally well-liked among GHC developers, GHC's interaction with Harbormaster, Phabricator's continuous integration component, has been less than rosy. The problem is in large part a mismatch between Harbormaster's design assumptions and GHC's needs, but it's also in part attributable to the somewhat half-finished state in which Harbormaster seems to linger. Regardless, we won't go into detail here; these issues are well covered elsewhere.

Suffice it to say that, after having looked at a number of alternatives to Harbormaster (including buildbot, GitLab's Pipelines, Concourse, and home-grown solutions), Jenkins seems to be the best option at the moment. Of course, this is not to say that it is perfect; as we have learned over the last few months it is very far from perfect. However, it has the maturity and user-base to be almost-certainly able to handle what we need of it on the platforms that we care about.

See the Trac ticket #13716

Let's see what we get out of this new bit of infrastructure:

Pre-merge testing

Currently there are two ways that code ends up in master,

  • a Differential is opened, built with Harbormaster, and eventually landed (hopefully, but not always, after Harbormaster successfully finishes)
  • someone pushes commits directly

Bad commits routinely end up merged via both channels. This means that authors of patches failing CI often need to consider whether *their* patch is incorrect or whether they rather simply had the misfortune of basing their patch on a bad commit. Even worse, if the commit isn't quickly reverted or fixed GHC will end up with a hole in its commit history where neither bisection nor performance tracking will be possible. For these reasons, we want to catch these commits before they make it into master.

To accomplish this we have developed some tooling to run CI on commits *before* they are finally merged to master. By making CI the only path patches can take to get to master, improve our changes of rejecting bad patches before they turn the tree red.

Automation of the release builds

Since the 7.10.3 release we have been gradually working towards automating GHC's release process. Thanks to this work, today a single person can build binary distributions for all seven tier-1 configurations in approximately a day, most of which is spent simply waiting. This has allowed us to take responsibility (starting in 8.2.1) for the OpenBSD, FreeBSD, ARMv7 and AArch64 builds in addition to the traditional tier-1 platforms, allowing us to eliminate the week-long wait between source distribution availability and the binary distribution announcement previously needed for correspondence with binary build contributors..

However, we are far from done: our new Jenkins-based build infrastructure (see #13716) will allow us to produce binary distributions directly from CI, reducing the cost of producing release builds to nearly nothing.

Testing of GHC against user packages

While GHC is already tested against Hackage and Stackage prior to release candidate availability, these builds have been of limited use as packages low on the dependency tree (think hashable and lens) often don't build prior to the first release candidate. While we do our best to fix these packages up, the sheer number of them makes this a losing battle for a small team such as GHC's.

Having the ability to cheaply produce binary distributions means that we can produce and validate nightly snapshot releases. This gives users a convenient way to test pre-release compilers and fix their libraries accordingly. We hope this will spread the maintenance effort across a larger fraction of the Haskell community and over a longer period of time, meaning there will be less to do at release time and consequently pre-release Stackage builds will be more fruitful.

Once the Jenkins infrastructure is stable, we can consider introducing nightly builds of user packages as well. While building a large population such as Stackage would likely not be productive, working with a smaller sample of popular, low-dependency-count packages would be quite possible. For testing against larger package repositories, leaning on a dedicated tool such as the Hackage Matrix Builder will likely be a more productive path.

Expanded platform coverage of CI

While GHC targets a wide variety of architectures and operating systems (and don't forget cross-compilation targets), by far the majority of developers use Linux, Darwin, or Windows on amd64. This means that breakage often only comes to light long after the culpable patch was merged.

Of course, GHC, being a project with modest financial resources, can't test each commit on every supported platform. We can, however, shrink the time between a bad commit being merged and the breakage being found by testing these "unusual" platforms on a regular (e.g. nightly) basis.

By catching regressions early, we hope to reduce the amount of time spent bisecting and fixing bugs around release time.

Tracking core libraries

Keeping GHC's core library dependencies (e.g. directory, process) up-to-date with their respective upstreams is important to ensure that tools that link against the ghc library (e.g. ghc-mod) can build easily. However, it also requires that we work with nearly a dozen upstream maintainers at various points in their own release cycles to arrange that releases are made prior to the GHC release. Moreover, there is inevitably a fair amount of work propagating verion bounds changes down the dependency tree. While this work takes relatively little effort in terms of man-hours,

Jenkins can help us here by allowing us to automate integration testing of upstream libraries, catching bounds issues and other compatibility issues well before they are in the critical path of the release.

Improved debugging tools

One of the most useful ways to track down a bugs in GHC is bisection. This is especially true for regressions found in release candidates, where you have at most a few thousand commits to bisect through. Nevertheless, GHC builds are long and developer time scarce so this approach isn't used as often as it could be.

Having an archive of nightly GHC builds will free the developer from having to build dozens of compilers during bisection, making the process a significantly more enjoyable experience than it is today. This will allow us to solve more bugs in less time and with far fewer grey hairs.

Status of Jenkins effort

The Jenkins CI overhaul has been an on-going project throughout the spring and summer and is nearing completion. The Jenkins configuration can be seen in the wip/jenkins branch on git.haskell.org (gitweb). At the moment the prototype is running on a few private machines but we will be setting up a publicly accessible test instance in the coming weeks. Jenkins will likely coexist with our current Harbormaster infrastructure for a month or so while we validate that things are stable.

Reflections on GHC's release schedule

Looking back on GHC's past release schedule reveals a rather checkered past,

Release Date Time to next major release
6.12.1 mid December 2009
12 months
7.0.1 mid November 2010
9.5 months
7.2.1 early August 2011
6 months
7.4.1 early February 2012
7 months
7.6.1 early September 2012
19 months
7.8.1 early April 2014
13 months
7.10.1 late March 2015
14 months
8.0.1 late May 2016
14 months
8.2.1 late July 2017
-
8.4.1 TDB and the topic of this post

There are a few things to notice here:

  • release cadence has swung rather wildly
  • the release cycle has stretched in the last several releases
  • time-between-releases generally tends to be on the order of a year

While GHC is far from the only compiler with such an extended release schedule, others (namely LLVM, Go, and, on the extreme end, Rust) have shown that shorter cycles are possible. I personally think that a more stable, shorter release cycle would be better for developers and users alike,

  • developers have a tighter feedback loop, inducing less pressure to get new features and non-critical bugfixes into minor releases
  • release managers have fewer patches to cherry-pick
  • users see new features and bugfixes more quickly

With 8.2.1 at long last behind us, now is a good time to reflect on why these cycles are so long, what release schedule we would like to have, and what we can change to realize such a schedule. On the way we'll take some time to examine the circumstances that lead to the 8.2.1 release which, while not typical, remind us that there is a certain amount of unpredictability inherent in developing large systems like GHC; a fact that must be born in mind when considering release policy.

Let's dig in...

The release process today

Cutting a GHC release is a fairly lengthy process involving many parties and a significant amount of planning. The typical process for a major release looks something like this,

  1. (a few months after the previous major release) A set of release priorities are defined determining which major features we want in the coming release
  2. wait until all major features are merged to the master branch
  3. when all features are merged, cut a stable branch
  4. in parallel:
    1. coordinate with core library authors to determine which library versions the new release should ship
    2. prepare release documentation
    3. do preliminary testing against Hackage and Stackage to identify and fix early bugs
    4. backport significant fixes merged to master
  5. when the tasks in (4) are sufficiently advanced, cut a source release for a release candidate
  6. produce tier-1 builds and send source tarballs to binary packagers, wait a week to prepare binary builds; if anyone finds the tree is unbuildable, go back to (5)
  7. upload release artifacts, announce release candidate
  8. wait a few weeks for testing
  9. if there are significant issues: fix them and return to (5)
  10. finalize release details (e.g. release notes, last check over core library versions)
  11. cut source tarball, send to binary build contributors, wait a week for builds
  12. announce final release, celebrate!

Typically the largest time-sinks in this process are waiting for regression fixes and coordinating with core library authors. In particular, the coordination involved in the latter isn't difficult, but merely high latency.

In the case of 8.2.1, the timeline looked something like this,

Time Event
Fall 2016 release priorities for 8.2 discussed
Early March 2017 stable branch cut
Early April 2017 most core library versions set
release candidate 1 cut
Mid May 2017 release candidate 2 cut
Early July 2017 release candidate 3 cut
Late July 2017 final release cut

Unexpected set-backs

This timeline was a bit more extended than desired for a few reasons.

The first issues were #13426 and #13535, compile-time performance regressions which came to light shortly after the branch and after the first release candidate, respectively. In #13535 it was observed that the testsuite of the vector package (already known for its propensity to reveal compiler regressions) increased by nearly a factor of five in compile-time allocations over 8.0.2.

While a performance regression would rarely classify as a release blocker, both the severity of the regressions combined with the fact that 8.2 was intended to be a performance-oriented release made releasing before fixes were available quite unappealing. For this reason David Feuer, Reid Barton, and I invested significant effort to try to track down the culprits. Unfortunately, the timescale on which this sort of bug is resolved span days, stretching to weeks when time is split with other responsibilities. While Reid's valiant efforts lead to the resolution of #13426, we were eventually forced to set #13535 aside as the release cycle wore on.

The second setback came in the form of two quite grave correctness issues (#13615, #13916) late in the cycle. GHC being a compiler, we take correctness very seriously: Users' confidence that GHC will compile their programs faithfully is crucial for language adoption, yet also very easily shaken. Consequently, while neither of these issues were regressions from 8.0, we deemed it important to hold the 8.2 release until these issues were resolved (which ended up being significant efforts in their own right; a blog post on this will be coming soon).

Finally, there was the realization (#13739) after release candidate 2 that some BFD linker releases suffered from very poor performance when linking with split-sections enabled (the default behavior in 8.2.1). This served as a forcing function to act on #13541, which we originally planned for 8.4. As expected, it took quite some time to follow through on this in a way that satisfied users and distribution packagers in a portable manner.

Moving forward: Compressing the release schedule

Collectively the above issues set the release back by perhaps six or eight weeks in total, including the additional release candidate necessary to validate the raft of resulting patches. While set-backs due to long-standing bugs are hard to avoid, there are a few areas where we can do better,

  1. automate the production of release artifacts
  2. regularly test GHC against user packages in between releases
  3. expand continuous integration of GHC to less common platforms to ensure that compatibility problems are caught before the release candidate stage
  4. regularly synchronize with core library maintainers between releases to reduce need for version bound bumps at release time
  5. putting in place tools to ease bisection, which is frequently a useful debugging strategy around release-time

As it turns out, nearly all of these are helped by our on-going effort to move GHC's CI infrastructure to Jenkins (see #13716). As this is a rather deep topic in its own right, I'll leave this more technical discussion for a second post (blog:jenkins-ci).

With the above tooling and process improvements, I think it would be feasible to get the GHC release cycle down to six months or shorter if we so desired. Of course, shorter isn't necessarily better: we need to be careful to balance the desire for a short release cycle against the need for an adequate post-release "percolation" time. This time is crucial to allow the community to adopt the new release, discover and fix its regressions. In fact, the predictability that a short release schedule (hopefully) affords is arguably more important than the high cadence itself.

Consequently, we are considering tightening up the release schedule for future GHC releases in a slow and measured manner. Given that we are now well into the summer, I think positioning the 8.4 release around February 2018, around seven months from now, would be a sensible timeline. However, we would like to hear your opinions.

Here are some things to think about,

  1. Do you feel that it takes too long for GHC features to make it to users' hands?
  2. How many times per year do you envision upgrading your compiler before the process becomes too onerous? Would the current load of interface changes per release be acceptable under a faster release cadence?
  3. Should we adjust the three-release policy to counteract a shorter GHC release cycle?
  4. Would you feel more likely to contribute to GHC if your work were more quickly available in a release?

We would love to hear your thoughts. Be sure to mention whether you are a user, GHC contributor, or both.