wiki:Proposal/NativeCpp

Native {-# LANGUAGE CPP #-}

Problem Statement

Currently, GHC relies on the system-installed C-preprocessor (lateron referred to as system-cpp) accompanying the C compiler for implementing {-# LANGUAGE CPP #-}. However, this has several drawbacks:

  • We already have a couple of tickets filed w/ the cpp keyword:

    #860
    CPP fails when a macro is used on a line containing a single quote character
    #6132
    Can't use both shebang line and #ifdef declarations in the same file.
    #8444
    Fix CPP issue with Xcode5 in integer-simple
    #8445
    Fix Xcode5 CPP issue with compiler/deSugar/DsBinds.lhs and compiler/utils/FastString.lhs
    #8493
    Can't compile happy + ghc with clang's CPP
    #9399
    CPP does not process test case enum01.hs correctly
    #9978
    DEBUG is always replaced as 1 when CPP pragma is on
    #10044
    Wrong line number reported with CPP and line beginning with #
    #10146
    Clang CPP adds extra newline character
    #10230
    multiline literals doesn't work with CPP extension.
    #10543
    MacOS: validate fails on \u
    #12391
    LANGUAGE CPP messes up parsing when backslash like \\ is at end of line (eol)
    #12516
    Preprocessing: no way to portably use stringize and string concatenation
    #12628
    __GLASGOW_HASKELL_LLVM__ is no longer an Int
    #14113
    Error message carets point at the wrong places in the presence of CPP macros

  • Fragile semantics, as the "traditional mode" in cpp GHC relies on is not well-specified, and therefore implementations disagree in subtle but annoying ways
    • Consider all the Clang-issues GHC experienced when Apple switched from the GCC toolchain to the Clang toolchain
    • Packages using -XCPP only tested with one system-cpp variant may not work with another system-cpp which either means more testing-cost and/or support-costs
    • Clang cpp does not support stringize and string concatenation in traditional mode (see ticket #12516)
  • As system-cpp is designed to handle mostly C-code, it conflicts with Haskell's tokenization/syntax, specifically:
    • Haskell-multi-line string literals can't be used anymore with -XCPP (c.f. SO Question and/or #10230)
    • Haddock comments get mangled as system-cpp isn't aware of Haskell comments
    • system-cpp may get confused about "unterminated" 's even though in Haskell they are not always used for quoting character literals. For example, Haskell allows variable names like x' or even x'y. Another practical example from the int-cast package, in the following code
      #if defined(WORD_SIZE_IN_BITS)
      type instance IntBaseType Int    = 'FixedIntTag  WORD_SIZE_IN_BITS
      type instance IntBaseType Word   = 'FixedWordTag WORD_SIZE_IN_BITS
      #else
      # error Cannot determine bit-size of 'Int'/'Word' type
      #endif
      
      GNU cpp fails to macro-expand WORD_SIZE_IN_BITS due to the unterminated '-quote
    • Valid Haskell operators such as /*, */ or // are misinterpreted by system-cpp as comment-starters
    • Unix She-bang (#!/usr/bin/env runghc) Haskell scripts can't be used with -XCPP (c.f. SO Q)
    • One case involving a comment containing C:\\... had an unexpected side-effect: https://github.com/haskell/cabal/pull/3810/commits/7a8062b9219c6353c18e31188cdbd38249578ab0
  • Lack of ability to extend/evolve -XCPP as we have no control over system-cpp

Possible Course of Actions

Plan 0: No change (i.e. keep using relying on system-cpp)

Nothing is gained, but since the issue remains unsolved, we may risk to become pressed for time (and/or cause GHC release delays) if the circumstances change suddenly and force us to act (e.g. if GCC's or Clang's cpp change in an incompatible way for GHC).

Plan 1: Use custom fixed cpp implementation bundled with GHC

  • Clang's cpp could be another candidate (as suggested here). Needs more investigation
  • Probably not as easy to extend/evolve to be more Haskell-syntax-aware

Plan 2: Embed Malcom's hackage:cpphs into GHC

Advantages

Disadvantages

  • cpphs is licensed as "LGPLv2 w/ static linking exception" (see below)
    • GHC's total licence agreement getting extended (TODO show concrete change)
    • The ghc package would be tainted by this license augmentation. (But no more tainted than it is already, by the LGPL'd GMP library (Gnu multi-precision arithmetic).)

Plan 3: Write native BSD-licenced Haskell implementation from scratch

Advantages

Disadvantages

  • Requires manpower and time
  • Additional long-term maintenance effort for GHC-HQ

Plan 3a: Embed hackage:hpp into GHC

Since this wiki page was first written, hackage:hpp has been written, which is BSD3 licensed.

Plan 4: Bundle cpphs-based executable with GHC

This is a variant of plan 2 where cpphs code remains in a separate executable.

Advantages

  • cpphs has been widely used, hence it's proven code
  • It's already more Haskell-aware than system-cpp
  • cpphs is actively maintained
  • no more fork(2)/exec(2)

Disadvantages

  • cpphs is licensed as "LGPLv2 w/ static linking exception" (see below)
    • GHC's total licence agreement getting extended (TODO show concrete change)
    • The ghc package would be tainted by this license augmentation

cpphs's licence in more detail

  • The main intent behind cpphs's current licensing is to have modifications/improvements of redistributed cpphs binaries made publicly available to recipients of the binaries (so that they can be e.g. merged upstream if useful). This is a concern the BSD3 licence doesn't address.
  • The library portion of the cpphs is dual-licensed (see http://code.haskell.org/cpphs/COPYRIGHT):
    • LGPL v2.1 with static linking exception

      As a relaxation of clause 6 of the LGPL, the copyright holders of this library give permission to use, copy, link, modify, and distribute, binary-only object-code versions of an executable linked with the original unmodified Library, without requiring the supply of any mechanism to modify or replace the Library and relink (clauses 6a, 6b, 6c, 6d, 6e), provided that all the other terms of clause 6 are complied with.

    • for binary distributions only: http://code.haskell.org/cpphs/LICENCE-commercial (doesn't seem useful for GHC)
  • As a practical consequence of the LGPL with static-linking-exception (LGPL+SLE), if no modifications are made to the cpphs-parts (i.e. the LGPL+SLE covered modules) of the GHC code-base, then there is no requirement to ship (or make available) any source code together with the binaries, even if other parts of the GHC code-base were modified.

ghc package's current license

The ghc package which can be linked into programs currently depends on the packages array, base, binary, bin-package-db, bytestring, containers, deepseq, directory, filepath, ghc-prim, hoopl, hpc, integer-gmp, pretty, process, rts, template-haskell, time, transformers, and unix whose collated LICENSE have been pasted as http://lpaste.net/131294

Last modified 6 months ago Last modified on May 18, 2017 8:05:18 AM