Version 9 (modified by hvr, 2 years ago) (diff)

more notes about URL rewriting/GitHub cloning

This page as well as the GitRepoReorganization are still work in progress.

See #8545 for the current state of affairs.

Workflows for Handling GHC's Git Submodules

General information about Git's submodule support:

Cloning a fresh GHC source tree

Initial cloning of GHC HEAD (into the folder ./ghc) is a simple as:

git clone --recursive git://

(Obviously, the clone URL can be replaced by any of the supported ghc.git URLs as listed on

Cloning a specific branch, e.g. ghc-7.8; or a specific tag, e.g. ghc-7.8.1-release:

git clone -b ghc-7.8 --recursive git:// ghc-7.8.x
git clone -b ghc-7.8.1-release --recursive git:// ghc-7.8.1

Older tags/branches which were not fully converted into a submodule-configuration, will require an additional ./sync-all get step to synchronize.

To clone from the GitHub GHC Mirror configure Git URL rewriting as described in the next section, as the submodule url paths need to be rewritten (e.g. ../packages/deepseq.git to ../packages-deepseq.git) and then proceed as if cloning from as described above (the actual network operations will be redirected to GitHub due to URL rewriting)

Using the GitHub GHC Mirror

You can instruct git to rewrite repo URLs via the git config url.<base>.insteadOf facility. For instance, the following configuration (which gets written to ${HOME}/.gitconfig, so this needs to be done only once) uses GitHub instead of for synchronizing/cloning the GHC repos:

git config --global url."git://".insteadOf git://
git config --global url."git://".insteadOf git://

(If needed, you can also add rewrite rules with git:// substituted by https:// or other schemes)

Asymmetric push/pull Git Repo URLS

Using git config url.<base>.insteadOf

This subsection is mostly relevant to developers with git push-permissions.

In addition to the git config url.<base>.insteadOf facility described in the previous section, there's also a pushInsteadOf facility which allows to rewrite only push operations and takes precedence over a respective insteadOf match. This can be used to use the faster (non-authenaticated) http(s):// or git:// based transports for read-operations, and only use the more heavyweight authenticated ssh:// transport for actual git push operations. Such an asymmetric push/pull setting can be configured globally like so:

git config --global url."ssh://".pushInsteadOf git://

# If you want to cover all bases, you can also set the following rewrite rules
git config --global url."ssh://".pushInsteadOf
git config --global url."ssh://".pushInsteadOf

By overriding remote.origin.pushurl

It's recommended to use the scheme based on the git config url.<base>.pushInsteadOf facility described in the previous subsection instead of the one described in this subsection.

This subsection is only relevant for developers with git push-permissions.

Unless the GHC source tree was cloned from ssh://, the resulting pushurls will not point to a writable location.

The following commands will configure appropriate push-URLs for ghc.git and all its (initialized) submodules:

git remote set-url --push origin ssh://

git submodule foreach 'git remote set-url --push origin \
  ssh://$(git config -f $toplevel/.gitmodules --path "submodule.$name.url" | sed "s,^\.\./,,")'

You can display the currently used Git URLs for git push in submodules by:

# if unset, remote.origin.pushurl defaults to remote.origin.url
git submodule foreach \
  'git config remote.origin.pushurl || git config remote.origin.url'

Updating an existing GHC source tree clone

At the top-level of ghc.git working copy:

git pull --rebase
git submodule update --init

Making changes to GHC submodules

It's very important to keep in mind that Git submodules track commits (i.e. not branches!) to avoid getting confused. Therefore, git submodule update will result in submodules having checked out a so-called detached HEAD.

So, in order to make change to a submodule you can either work directly on the detached HEAD, or checkout the respective branch the commit is supposed to be pointed at from. The example below will demonstrate the latter approach for the utils/haddock submodule:

# do this *before* making changes to the submodule
cd utils/haddock
git checkout master
git pull --rebase

# perform modifications and as many `git {add,rm,commit}`s as you deem necessary
$EDITOR src/somefile.hs

# finally, after you're ready to publish your changes, simply push the changes as for an ordinary Git repo
git push

# go back to ghc.git top-level
cd ../..

At this point, the remote haddock.git contains newer commits in the master branch, which still need to be registered with ghc.git:

# if you want, you can inspect with `git submodule` and/or `git status`
# if there are submodules needing attention;
# specifically, the commands below should report new commits in `util/haddock`
git submodule
git submodule summary
git status

# Register the submodule update for the next `git commit` as you would any other file
# Note: You can think of submodule-references as virtual files which 
#       contain a SHA1 string pointing to the submodule's commit.
git add util/haddock

# you can also add other changes in `ghc.git` (e.g. testsuite changes) and/or other submodules 
# you need to update atomically with the next commit
git add testsuite/...

# prepare a commit, and make sure to mention the string `submodule` in the commit message
git commit -m 'update haddock submodule ... blablabla'

# finally, push the commit to the remote `ghc.git` repo
git push

There are server-side validation hooks in place to make sure for non-wip/ branches that ghc.git never points to non-existing commits. Also, as a safe-guard against accidental submodule reference updates, the string submodule must occur somewhere in commit messages of commits updating submodule references.


  • Describe how to make use of git submodule update --remote