wiki:BangPatterns

Version 5 (modified by simonpj@…, 8 years ago) (diff)

--

Bang patterns

Tickets

#76
Bang patterns

Goal

Our goal is to make it easier to write strict programs in Haskell. Programs that use 'seq' extensively are possible but clumsy, and it's often quite awkward or inconvenient to increase the strictness of a Haskell program. This proposal changes nothing fundamental; but it makes strictness more convenient.

The basic idea

The main idea is to add a single new production to the syntax of patterns

	pat ::= !pat

Matching an expression e against a pattern !p is done by first evaluating e (to WHNF) and then matching the result against p.

Example:

f1 !x = True

f1 is strict in x. In this case, it's not so bad to write

f1 x = x `seq` True

but when guards are involved the seq version becomes horrible:

-- Duplicate the seq on y
f2 x y | g x = y `seq` rhs1
       | otherwise = y `seq` rhs2

-- Have a weird guard
f2 x y | y `seq` False = undefined
       | g x = rhs1
       | otherwise = rhs2

-- Use bang patterns
f2 !x !y | g x = rhs1
         | otherwise = rhs2

Bang patterns can be nested of course:

f2 (!x, y) = [x,y]

f2 is strict in x but not in y. A bang only really has an effect if it precedes a variable or wild-card pattern:

f3 !(x,y) = [x,y]
f4 (x,y)  = [x,y]

f3 and f4 are identical; putting a bang before a pattern that forces evaluation anyway does nothing.

g5 x = let y = f x in body
g6 x = case f x of { y -> body }
g7 x = case f x of { !y -> body }

g5 and g6 mean exactly the same thing. But g7 evalutes (f x), binds y to the result, and then evaluates body.

Let and where bindings

In Haskell, let and where bindings can bind patterns. We propose to modify this by allowing an optional bang at the top level of the pattern. Thus for example:

  let ![x,y] = e in b

The "!" should not be regarded as part of the pattern; after all, in a function argument ![x,y] means the same as [x,y]. Rather, the "!" is part of the syntax of let bindings.

The semantics is simple; the above program is equivalent to:

  let p@[x,y] = e in p `seq` b

That is, for each bang pattern, invent a variable p, bind it to the banged pattern (removing the bang) with an as-pattern, and seq on it in the body of the let. (Thanks to Ben Rudiak-Gould for suggesting this idea.)

A useful special case is when the pattern is a variable: Similarly

  let !y = f x in b

means

  let y = f x in y `seq` b

which evaluates the (f x), thereby giving a strict let.

A useful law is this. A non-recursive bang-pattern binding is equivalent to a case expression:

  let !p = <rhs> in <body>
     is equivalent to (when the let is non-recursive)
  case <rhs> of !p -> <body>

Example

Here is a more realistic example, a strict version of partition:

partitionS p [] = ([], [])
partitionS p (x:xs) 
  | p x       = (x:ys, zs)
  | otherwise = (ys, x:zs)
  where
    !(ys,zs) = partitionS p xs

The bang in the where clause ensures that the recursive call is evaluated eagerly. In Haskell today we are forced to write

partitionS p [] = ([], [])
partitionS p (x:xs) 
  = case partitionS p xs of
	(ys,zs) | p x       = (x:ys, zs)
		| otherwise = (ys, x:zs)

which is clumsier (especially if there are a bunch of where-clause bindings, all of which should be evaluated strictly), and doesn't provide the opportunity to fall through to the next equation (not needed in this example but often useful).

Recursive let and where bindings

At first you might think that a recursive bang pattern don't make sense: how can you evaluate it strictly if it doesn't exist yet? But consider

  let !xs = if funny then 1:xs else 2:xs in ...

Here the binding is recursive, but the translation still makes sense:

  let xs = if funny then 1:xs else 2:xs 
  in xs `seq` ...

Top-level bang-pattern bindings

Does this make sense?

  module Foo where
    !x = factorial 1000

A top-level bang-pattern binding like this would imply that the binding is evaluated when the program is started; a kind of module initialisation. This makes some kind of sense, since (unlike unrestricted side effects) it doesn't matter in which order the module initialisation is performed.

But it's not clear why it would be necessary or useful. Conservative conclusion: no top-level bang-patterns.

Tricky point: syntax

What does this mean?

   f ! x = True

Is this a definition of (!) or a banged argument? (Assuming that space is insignificant.)

Proposal: resolve this ambiguity in favour of the bang-pattern. If you want to define (!), use the prefix form

   (!) f x = True

Another point that came up in implementation is this. In GHC, at least, patterns are initially parsed as expressions, because Haskell's syntax doesn't let you tell the two apart until quite late in the day. In expressions, the form (! x) is a right section, and parses fine. But the form (!x, !y) is simply illegal. Solution: in the syntax of expressions, allow sections without the wrapping parens in explicit lists and tuples. Actually this would make sense generally: what could (+ 3, + 4) mean apart from a tuple of two sections?

Tricky point: nested bangs (part 1)

Consider this:

  let (x, Just !y) = <rhs> in <body>

Is y evaluted before <body> is begun? No, it isn't. That would be quite wrong. Pattern matching in a let is lazy; if any of the variables bound by the pattern is evaluated, then the whole pattern is matched. In this example, if x or y is evaluated, the whole pattern is matched, which in turn forces evaluation of y. The binding is equivalent to

  let t = <rhs>
      x = case t of { (x, Just !y) -> x }
      y = case t of { (x, Just !y) -> y }
  in <body>

Tricky point: nested bangs (part 2)

Consider this:

  let !(x, Just !y) = <rhs> in <body>

This should be equivalent to

  case <rhs> of { (x, Just !y) -> <body> }

Notice that this meant that the entire pattern is matched (as always with Haskell). The Just may fail; x is not evaluated; but y is evaluated.

This means that you can't give the obvious alternative translation that uses just let-bindings and {{{seq}}. For example, we could attempt to translate the example to:

  let 
    t = <rhs>
    x = case t of (x, Just !y) -> x
    y = case t of (x, Just !y) -> y
  in
  t `seq` <body>

This won't work, because using seq on t won't force y. However, the semantics says that the original code is equivalent to

  let p@(x, Just !y) = <rhs> in p `seq` <body>

and we can desugar that in obvious way to

  let 
    t = <rhs>
    p = case t of p@(x, Just !y) -> p
    x = case t of p@(x, Just !y) -> x
    y = case t of p@(x, Just !y) -> y
  in
  p `seq` <body>

which is fine.

You could also build an intermediate tuple, thus:

  let 
    t = case <rhs> of p@(x, Just !y) -> (p,x,y)
    p = sel13 t
    x = sel23 t
    y = sel33 t
  in
  t `seq` <body>

Indeed GHC does just this for complicated pattern bindings.

Tricky point: pattern-matching semantics

A bang is part of a pattern; matching a bang forces evaluation. So the exact placement of bangs in equations matters. For example, there is a difference between these two functions:

  f1 x  True  = True
  f1 !x False = False

  f2 !x True  = True
  f2  x False = False

Since pattern matching goes top-to-bottom, left-to-right, (f1 bottom True) is True, whereas (f2 bottom True) is bottom.

Tricky point: polymorphism

Haskell allows this:

  let f :: forall a. Num a => a->a
      Just f = <rhs>
  in (f (1::Int), f (2::Integer))

But if we were to allow a bang pattern, !Just f = <rhs>, with the translation to a case expression given earlier, we would end up with

   case <rhs> of { Just f -> (f (1::Int), f (2::Integer) }

But if this is Haskell source, then f won't be polymorphic.

One could say that the translation isn't required to preserve the static semantics, but GHC, at least, translates into System F, and being able to do so is a good sanity check. If we were to do that, then we would need

    <rhs> :: Maybe (forall a. Num a => a -> a)

so that the case expression works out in System F:

   case <rhs> of { 
     Just (f :: forall a. Num a -> a -> a)
	-> (f Int dNumInt (1::Int), f Integer dNumInteger (2::Integer) }

The trouble is that <rhs> probably has a type more like

    <rhs> :: forall a. Num a => Maybe (a -> a)

...and now the dictionary lambda may get in the way of forcing the pattern.

This is a swamp. Conservative conclusion: no generalisation (at all) for bang-pattern bindings.

Changes to the Report

The changes to the Report would be these. (Incomplete.)

  • Section 3.17, add pat ::= !pat to the syntax of patterns. We would need to take care to make clear whether
    f !x = 3
    
    was a definition of the function "!", or of "f". (There is a somewhat similar complication with n+k patterns; see the end of 4.3.3.2 in the Report. However we probably do not want to require parens thus
    f (!x) = 3
    
    which are required in n+k patterns.
  • Section 3.17.2: add new bullet 10, saying "Matching the pattern "!pat" against a value "v" behaves as follows:
    • if v is bottom, the match diverges
    • otherwise, "pat" is matched against "v".
  • Fig 3.1, 3.2, add a new case (t):
    case v of { !pat -> e; _ -> e' }
       = v `seq` case v of { pat -> e; _ -> e' }
    
  • Section 3.12 (let expressions). In the translation box, wherever there is a bang right at the top of a pattern on the LHS of the translation rule, omit the implicit tilde that occurs at the top of the pattern on the RHS of the rule.

The last point is the only delicate one. If we adopt this proposal we'd need to be careful about the semantics. For example, are bangs ok in recursive bindings? (Yes.) And for non-recursive bindings the order of the bindings shouldn't matter.