Many of the libraries and tools in a GHC tree are actually maintained by someone else. They therefore have a separate upstream repository, from which we need to pull. That repository may be either a darcs or a git repository; in the darcs case, we also need to convert to a git repository for use in a GHC tree. However, if the darcs repository is on another server, then we first need to mirror it for the conversion program to use. This diagram shows how changes migrate from one repo to another:
This means that when making changes needed in GHC to one of these libraries, we first need to put the changes in the upstream repository. Note that a git hook prevents you from pushing patches to the ghc repos until they are already in the git mirror repos, so that we cannot forget to send changes upstream.
The mirrors are updated automatically each night, but you can force an immediate update by running /srv/darcs/do_mirrors on darcs.haskell.org.
Note that the following table might be out of date, please refer to GHC's packages file which is always up to date as otherwise scripts will break.
Moreover, the list of upstream Darcs repos mirrored as Git repositories can be found here.Here is where the upstream repositories are, and their mirrors. The master repository is identified in green.
|darcs upstream||darcs mirror||git upstream||git mirror||ghc (validated) repo||in-tree|
Modifiying libraries for which there is an upstream repository
The process for updating libraries for which there is an upstream repo is a a little more complicated than for libraries where the repo is part of the GHC setup:
- Any changes needed by GHC should be made not only in our repository, but also in the upstream repository.
- Being used in a GHC tree should not make life harder for the upstream maintainer.
Note that these two objectives are to some extent in conflict: If a change in GHC or one of its libraries requires a change in a library with an upstream repo then, in order to satisfy objective 1, the maintainer would need to apply the patch, but if doing so is currently inconvenient for them then this would fail objective 2. This policy therefore tries to find the best compromise, without being too onerous for any party.
For these repositories, we use a "git submodule" rather than a normal repository. Using submodules means that the repository doesn't need to follow a linear path through the git history, but can instead jump around, for example from a release commit on one branch to the next release commit on a different branch.
From the GHC developer's point of view
If you are not modifying these packages then you don't need to do anything special: A regular ./sync-all pull will update the submodules as normal. However, you may find it useful to run
git config --global diff.ignoreSubmodules dirty
or each time you run git status or git diff, git will check for changes not only in the GHC repository, but also in all the submodules. (you must have git >= 1.7.3 for this to work).
If you need to modify one of these libraries, then ordinarily you should first send the modifications upstream. Ideally upstream will apply the patches and make a release (the easiest way to accomplish objective 1 is for changes to be applied upstream first, so that they can't be forgotten about after being applied to GHC's repo). You can then update GHC's submodule by running
cd libraries/foo git reset --hard some_commit_id cd ../.. git commit -a ./sync-all push
There are some scenarios where you may need to modify GHC's repository without the upstream repository already having the change that you need:
- The maintainer may tell you that they are too busy to deal with the package at the moment, or not be responding at all. In this case, it may be necessary to make changes only to GHC's repositories in the short term, and for the changes to be merged upstream later.
- In a GHC stable branch, we may be using an old version of a library that we need to make a change to, but upstream may only be interested in working on the latest version rather than also maintaining old release branches. In that case, we would only make the change in the GHC respository.
In order to make the change in this case, you
cd libraries/foo git commit -a git push -f origin HEAD:refs/heads/ghc-head cd ../.. git commit -a ./sync-all push
(use e.g. ghc-7.6 rather than ghc-head if this patch is for a branch only).
Important: If you make a change to a submodule, then make sure you commit in both that repository and the ghc repository before using ./sync-all get or ./sync-all pull. Those commands run git submodule update, which may cause you to lose unrecorded changes.
From the upstream maintainer's point of view
Upstream maintainers don't need to do anything special. You can continue to use any version control system and whatever branching policy works best for you. However, there are two issues to be aware of:
- For libraries that are shipped with GHC, we need to have releases of libraries that can build with that GHC. There may be no suitable existing release (most commonly due to trivial things such as library dependencies needing to be changed, but sometimes due to real changes in other libraries or the compiler), in which case we will request that you make a suitable release or, if it is not convenient for you to do so, we can make one on your behalf (in which case it will normally have only the minimal changes necessary since the previous release).
- Sometimes we may need to make changes to old versions of libraries, as we try to avoid making interface changes within GHC stable branches and upstream development may have moved on since a GHC stable branch was created. When this happens it is up to you whether the changes are sent upstream as normal (and maintained in an upstream branch), or whether they are left only in the GHC repository. Note that if they are made only the GHC repository then we will probably need to make a release from the GHC repository, as per the previous point.