wiki:WorkingConventions/Git

Version 37 (modified by thoughtpolice, 7 months ago) (diff)

Refactor new info

Guidelines for using git with GHC

GHC uses Git (version 1.7.3.4 or newer recommended) for revision control. This page describes various GHC-specific conventions for using git, together with some suggestions and tips for using git effectively.

Existing darcs users see: Git For Darcs Users. If you have an existing source tree in darcs and need to convert patches, see Darcs To Git. Simon PJ's git notes are GIT SPJ.

Setup

General Guidelines

  • Try to make small patches (i.e. work in consistent increments).
  • Separate changes that affect functionality from those that just affect code layout, indentation, whitespace, filenames etc. This means that when looking at patches later, we don't have to wade through loads of non-functional changes to get to the important parts of the patch.
  • If possible, commit often. This helps to avoid conflicts.
  • Discuss anything you think might be controversial before pushing it.
  • When making changes to other repositories in a GHC tree, see Repositories.

Author

Please make sure you have setup git to use the correct name and email for your commits. Use the same name and email on all machines you may push from.

$ git config --global user.name "Firstname Lastname" # Sets the name of the user for all git instances on the system
$ git config --global user.email "your_email@youremail.com"

This will set your name and email globally. To set it for just the GHC repo, remove the --global flag. Also, the environment variables GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL, GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL will override git-config settings if they are defined.

Commit messages

Please try to follow the general convention for the Git commit message structure as many Git tools rely on this. Moreover, take into account that the commit message text is interpreted as WikiFormatting in Trac.

In particular, if your patch addresses or fixes a bug/ticket, then include the ticket number in the form "#NNNN" in the commit message, e.g.

  withMVar family have a bug (fixes #767)

Git will then add a link to the commit from the ticket (as soon as the commit becomes reachable from the master HEAD), so that people watching the ticket can see that a fix has been committed, and in the future we can easily find the patch that addressed the ticket. When navigating the Git history on Trac, you will also be able to jump directly to the ticket from the commit.

Line endings

Files in GHC repos should use Unix conventions for line endings. If you are on Windows, ensure that git handles line-endings sanely by running:

git config --global core.autocrlf false

To find out what files in your tree have windows (CRLF) line endings, use

find . -name '*hs' | xargs file | grep CRLF

Do this before you commit them!

Working with the tree

Git tricks

When working with GHC, there are a lot of ways you can use Git to make your life easier. Below are some of them:

Selectively record changes to commit

Do you miss Darcs? Do you hate it when a file contains a bugfix *and* a new feature, and you want to commit both separately? That's OK! Just run:

$ git add -p

This opens the interactive diff selector, which behaves a lot like darcs record. It will go through every change you have made in the working tree, asking if you want to git add it to the index, so you can commit it afterwords.

Nota bene: this only adds files to the index, it does not commit them. Afterwords, you may commit the result using git commit. Do not use git commit -a, or you will just add all the changes to the commit!

Selectively cherry-pick a commit from a branch

You still miss Darcs. One thing that would be great is if you could just 'pluck' one commit from a branch into your tree, but not the others. Sounds good - git cherry-pick to the rescue!

$ git checkout master
$ git cherry-pick <sha1 id>

this will checkout to master, and pull in only the commit you refer to. It does not create a merge, it's as if the commit had existed on this branch all along. This is wonderfully useful for selectively plucking changes from someone's Git tree, or branch.

Merge a branch into a Super Big Commit

Let's say you have a branch foo you would like to merge into master, but you have 10 small commits on foo, and you only want to make 1 Big Commit on master. Many times, we land features in a single 'big commit' to keep the history clean. This is easily doable with:

$ git checkout master
$ git merge --squash foo

and then you can commit your new, unstaged changes into a big commit after fixing any conflicts. --squash basically tells git to merge the changes, but not merge the commits. This is exactly what you want.

Basic rebases

What if you have a branch that's slightly out of date called foo, and you want to bring it up to date with master?

$ git checkout master
$ git pull origin master
$ git rebase master foo

This will:

  • Checkout to master.
  • Update master to the latest upstream version.
  • Rebase foo onto master.

Where rebasing includes:

  • Checkout to the branch foo.
  • Discard all the commits you have made on foo, temporarily
  • Bring foo up to date with master (by fast-forwarding the tree)
  • Replay all your previous commits from foo onto the New-And-Improved foo branch

This, in effect, will bring foo up to date with master, while preserving your commits.

Q: But there was a conflict! A: That's OK. If git rebase encounters a conflict while replaying your work, it will stop and tell you so. It will ask you to fix the conflict, and git add the conflicting files. Then you can continue using git rebase --continue.

Q: I started to rebase, but I confused myself and don't know how to get out! Help! A: You can always run git rebase --abort, which will abort the current rebase operation, and return you to your working tree.

Using the reflog

Eventually when working in the repository, you'll invariably do something on accident that will delete work. If you have never committed the changes, then you're out of luck (commit often, commit early - even locally!) But have you ever done something like:

  • Accidentally lost a commit, by deleting a branch?
  • Accidentally lost a commit through rebasing?
  • Amended a commit (git commit --amend), only to find out you broke it, and you want to undo the amendment?
  • Accidentally overwrote a branch with dangerous operation, like git push --force?

The reflog can save you from all of these, and more. In short, the reflog is a log that records every modification which Git tracks. To understand that, first understand this: despite its appearance, the Git data model has a core tenant: it is immutable - data is never deleted, only new copies can be made (the only exception is when garbage collection deletes nodes which have no outstanding references - much like our own GC!) Not even a rebase - which can rewrite the history - can actually delete old data. Second, we need to understand an important part of git checkout: the purpose of checkout is not to switch branches. Checkout, roughly speaking, allows you to check out your tree to any state, revision, or copy in the history. You don't have to checkout to a branch: you can checkout to a commit from 3 weeks ago, a commit that does not exist on a branch, or a completely empty branch with nothing in common. You can checkout the entire tree, or you could checkout an individual file, or a single directory. The point being: checkout takes you to a state in the history. So with that in mind, think of reflog like the audit log you can use to see what operations were performed on the immutable git history. Every operation is tracked. Let's look at an example, from Austin's validation tree he uses to push commits:

$ git reflog --date=relative # this will open an interactive pager
ad15c2b HEAD@{5 hours ago}: pull -tu origin master: Fast-forward
75a9664 HEAD@{27 hours ago}: merge amp: Fast-forward
1ef941a HEAD@{27 hours ago}: checkout: moving from amp to master
75a9664 HEAD@{27 hours ago}: commit (amend): Implement the AMP warning (#8004)
daa9a30 HEAD@{28 hours ago}: rebase -i (finish): returning to refs/heads/amp
daa9a30 HEAD@{28 hours ago}: rebase -i (pick): Implement the AMP warning (#8004)
b20cf4e HEAD@{28 hours ago}: rebase -i (pick): Fix AMP warnings.
1ef941a HEAD@{28 hours ago}: checkout: moving from amp to 1ef941a82eafb8f22c19e2643685679d2454c24a
3e8c33e HEAD@{28 hours ago}: commit: Fix AMP warnings.
70406bc HEAD@{28 hours ago}: reset: moving to HEAD~
d2afc83 HEAD@{28 hours ago}: cherry-pick: Fix most AMP warnings.
70406bc HEAD@{28 hours ago}: commit (amend): Implement the AMP warning (#8004)
697f9da HEAD@{28 hours ago}: cherry-pick: Implement the AMP warning (#8004)
1ef941a HEAD@{28 hours ago}: checkout: moving from master to amp

The most recent operations are first, and older operations appear chronologically. Let's note a few things:

  • The work you previously had still exists, and has a commit ID. It is on the far left.
  • The reflog tells you what operation resulted in the commit: in my history, we can see I did:
    • At one point, I reset my tree and undid my latest commit (in 70406bc, using git reset.) Then I kept working.
    • Several git cherry-pick operations.
    • Several commits, and some git commit --amend operations.
    • I checked out to master.
    • Then I did a merge of the amp branch, which was a fast-forward: my previous changes had rebased the amp branch.
    • Later on, I pulled my tree and I got some updates from upstream.
  • The reflog tells you what was modified; in this case it shows you the commits I changed.

With this information, I can now restore my tree to any of those partial states. For example, let's say I git commit --amend the AMP patch in 75a9664, and did some more stuff. But then it turns out I didn't want any of that, and I didn't want the amendment either. I can easily do:

$ git checkout -b temp daa9a30

Now, I am on the temp branch, and my HEAD commit points to the patch, without any amendments. I've essentially checked out to a point in the tree without any of those changes - because git never modifies the original data, this old copy still exists. Now that I am on the temp branch, I can do any number of things. Perhaps I can just delete the old amp branch, and merge the temp branch instead now.

As you can see, the reflog saved me here: I undid some nasty work in my personal tree, which otherwise might have been much more error prone or difficult to perform.

The reflog is not needed often, but it is often indispensable when you need it.

Advanced Git tricks

Finally, there are some advanced tips, not for the faint of heart:

Interactive rebases

At a certain point of git usage, you'll want to rewrite history by rebasing interactively. This can be done by running:

$ git rebase -i <commit range>

For example:

$ git rebase -i HEAD~10

will allow you to interactively rebase the last 10 commits on your branch. This power allows you to:

  • Reorder patches, by reordering the entries in the rebase list. If two patches don't touch each other, you can always switch their order and everything will be OK.
  • Drop patches, and completely remove them from the history, by removing them from the list.
  • Squash commits, which will let you compress a series of commits into one.
  • Reword commits, which will let you rewrite the commit message for any commit in the list, without touching anything else. (This is one of the most common ones I - Austin Seipp - use.)

Workflow with validate

All changes to GHC and the libraries need to be validated before they can be pushed to the main repositories. Validation can take a while - 30 minutes on a 4-core machine is typical - so ideally you want to be validating changes while you are working in a separate tree. In fact, there are other compelling reasons to have two trees in your development workflow, one for working in and one for validation:

  • Validation uses build settings that are different to the ones you would normally use while developing: it adds more libraries (DPH), builds extra ways (dynamic libraries), and builds all the documentation, so you don't want to use the same build for validation and ordinary development. In the development tree we use build settings optimised for development: -O0 -DDEBUG for the compiler, minimal libraries and ways so that rebuilding is fast.
  • Having two trees eliminates a common source of breakage in the main repository: with one tree it is easy to add new files but forget to commit them. Your tests will work, but the build will be broken for others. If you have to pull your changes into a separate tree for testing, you'll notice the missing files before you push.

The typical workflow is to work in the development tree, pull into the validate tree, validate, and then push from the validate tree. But what if validate fails? There are two options:

  1. discard the patch in the validate tree (using some instance of git reset) and go back to the working tree to fix it
  2. or, add a new patch in the validate tree to fix the problem and re-validate

(1) is more for "back to the drawing board" kinds of failure, whereas (2) is for cases where you just need to fix a warning or some other minor error exposed by validate.

Setting up the trees

Let's call the two trees ghc-working and ghc-validate.

Set up your repos like this:

$ git clone http://git.haskell.org/ghc.git ghc-working
$ cd ghc-working
$ ./sync-all --testsuite --no-dph get
$ cd ..
$ git clone ghc-working ghc-validate
$ cd ghc-validate
$ ./sync-all --testsuite get
$ ./sync-all -r http://git.haskell.org remote set-url origin
  # Get the dph libraries too
$ ./sync-all --testsuite get
$ ./sync-all -r `pwd`/../ghc-working remote add working
$ ./sync-all -r ssh://git@git.haskell.org remote set-url --push origin

(omit the last step if you don't have an account for GHC's Git repositories, you can still submit patches via the mailing list (using git format-patch will help you with this) or send a pull request to get your changes in GHC).

Now you have ghc-working and ghc-validate repos, and additionally the ghc-validate repo tree is set up with a remote working pointing to the ghc-working tree, and pushing from ghc-validate will push changes via SSH to git.haskell.org.

The rebase workflow

How do we move patches from ghc-working and ghc-validate? There are several options here. One is to just use sync-all pull working and do merging as usual. This works fine, but results in extra "merge commits" that aren't particularly helpful and clutter the commit logs and the mailing list. A better approach is to rebase patches before committing. This is done as follows:

  1. Pull from ghc-working into ghc-validate: ./sync-all pull working master
  2. Rebase onto origin/master: ./sync-all pull --rebase. You may encounter conflicts, in which case git will tell you what to do (usually fix the conflict and then git rebase --continue in the appropriate repository), then you can resume with ./sync-all --resume pull --rebase at the top.
  3. Check what you have relative to origin: ./sync-all new
  4. ./validate
  5. if validate went through, ./sync-all push (you might like to check once more what will be pushed: ./sync-all new).

If push fails because patches have been pushed by someone else while you were validating, it is acceptable to git pull --rebase in that repository and push if there are no conflicts (no need to validate again).

Now, the patches pushed this way are different (have different hashes) from the patches that you originally committed in ghc-working, and if you try to pull these patches in ghc-working again, confusion and conflicts will ensue. Fortunately there's an easy solution: just rebase again in ghc-working, and git will notice that your patches are already upstream and will discard the old versions. It's as simple as

 $ cd ghc-working
 $ ./sync-all pull --rebase

If rebase encounters a conflict at any point, it will tell you what to do. After fixing the conflict and completing the rebase manually, you can then resume the pull with ./sync-all --resume pull --rebase.

There is a slight tweak to this workflow that you might find more convenient: do a ./sync-all pull --rebase in the ghc-working tree prior to pulling into ghc-validate. This lets you fix conflicts in ghc-working rather than in ghc-validate, and test the resolution before validating. The downside is that you might now have to do a lot of rebuilding in your ghc-working tree if there are a lot of changes to pull.

Contributing patches

Please write your patch and then rebase to the latest version of GHC HEAD before sending to us. You can use the following command to send patches via email:

git send-email --to=ghc-devs@haskell.org <hash-id> -1

where <hash-id> is the hash of the commit to send. If you'd prefer to create patch files and send them via email another way (or attach them to trac tickets) then you can use this command:

git format-patch [-o <outputdir>] <revision range>

Where <revision range> specifies the commit that git should stop at when going from HEAD backwards, creating a patch for each commit in the range <revision range>..HEAD.

Applying patches from email

git am -3 <email>

The stable branch

See WorkingConventions/Releases.