Changes between Version 6 and Version 7 of Commentary/Compiler/CmmType


Ignore:
Timestamp:
Dec 6, 2006 7:34:58 PM (9 years ago)
Author:
p_tanski
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/CmmType

    v6 v7  
    238238|| `float32` || `F_` || `StgFloat` ||
    239239|| `float64` || `D_` || `StgDouble` ||
     240
     241
     242[[GhcFile(includes/Cmm.h)]] also defines `L_` for `bits64`, so `F_`, `D_` and `L_` correspond to the `GlobalReg` data type constructors `FloatReg`, `DoubleReg` and `LongReg`.  Note that although GHC may generate other register types supported by the `MachRep` data type, such as `I128`, they are not parseable tokens.  That is, they are internal to GHC.  The special defines `CInt` and `CLong` are used for compatibility with C on the target architecture, typically for making `foreign "C"` calls.
     243
     244'''Note''': Even Cmm types that are not explicit variables (Cmm literals and results of Cmm expressions) have implicit `MachRep`s, in the same way as you would use temporary registers to hold labelled constants or intermediate values in assembler functions.  See:
     245 * [wiki:Commentary/Compiler/CmmType#LiteralsandLabels Literals and Labels] for information related to the Cmm literals `CmmInt` and `CmmFloat`; and,
     246 * [wiki:Commentary/Compiler/CmmType#Expressions Expressions], regarding the `cmmExprRep` function defined in [[GhcFile(compiler/cmm/Cmm.hs)]].
     247
     248==== Global Registers and Hints ==== 
     249These are universal both to a Cmm module and to the whole compiled program.  Variables are global if they are declared at the top-level of a compilation unit (outside any procedure).  Global Variables are marked as external symbols with the `.globl` assembler directive.  In Cmm, global registers are used for special STG registers and specific registers for passing arguments and returning values.  The Haskell representation of Global Variables (Registers) is the `GlobalReg` data type, defined in [[GhcFile(compiler/cmm/Cmm.hs)]]:
     250{{{
     251data GlobalReg
     252  -- Argument and return registers
     253  = VanillaReg                  -- general registers (int, pointer, char values)
     254        {-# UNPACK #-} !Int     -- the register number, such as R3, R11
     255  | FloatReg            -- single-precision floating-point registers
     256        {-# UNPACK #-} !Int     -- register number
     257  | DoubleReg           -- double-precision floating-point registers
     258        {-# UNPACK #-} !Int     -- register number
     259  | LongReg             -- long int registers (64-bit, really)
     260        {-# UNPACK #-} !Int     -- register number
     261  -- STG registers
     262  | Sp                  -- Stack ptr; points to last occupied stack location.
     263  | SpLim               -- Stack limit
     264  | Hp                  -- Heap ptr; points to last occupied heap location.
     265  | HpLim               -- Heap limit register
     266  | CurrentTSO          -- pointer to current thread's TSO
     267  | CurrentNursery      -- pointer to allocation area
     268  | HpAlloc             -- allocation count for heap check failure
     269
     270                -- We keep the address of some commonly-called
     271                -- functions in the register table, to keep code
     272                -- size down:
     273  | GCEnter1            -- stg_gc_enter_1
     274  | GCFun               -- stg_gc_fun
     275
     276  -- Base offset for the register table, used for accessing registers
     277  -- which do not have real registers assigned to them.  This register
     278  -- will only appear after we have expanded GlobalReg into memory accesses
     279  -- (where necessary) in the native code generator.
     280  | BaseReg
     281
     282  -- Base Register for PIC (position-independent code) calculations
     283  -- Only used inside the native code generator. It's exact meaning differs
     284  -- from platform to platform (see compiler/nativeGen/PositionIndependentCode.hs).
     285  | PicBaseReg
     286}}}
     287For a description of the `Hp` and `Sp` ''virtual registers'', see [wiki:Commentary/Rts/HaskellExecution The Haskell Execution Model] page.  General `GlobalReg`s are clearly visible in Cmm code according to the following syntax defined in [[GhcFile(compiler/cmm/CmmLex.x)]]:
     288|| '''`GlobalReg` Constructor''' || '''Syntax''' || '''Examples''' ||
     289|| `VanillaReg Int` || `R ++ Int` || `R1`, `R10` ||
     290|| `FloatReg Int` || `F ++ Int` || `F1`, `F10` ||
     291|| `DoubleReg Int` || `D ++ Int` || `D1`, `D10` ||
     292|| `LongReg Int` || `L ++ Int` || `L1`, `L10` ||
     293General `GlobalRegs` numbers are decimal integers, see the `parseInteger` function in [[GhcFile(compiler/utils/StringBuffer.lhs)]].  The remainder of the `GlobalReg` constructors, from `Sp` to `BaseReg` are lexical tokens exactly like their name in the data type; `PicBaseReg` does not have a lexical token since it is used only inside the NCG. 
     294
     295`GlobalRegs` are a very special case in Cmm, partly because they must conform to the STG register convention and the target C calling convention.  That the Cmm parser recognises `R1` and `F3` as `GlobalRegs` is only the first step.  The main files to look at for more information on this delicate topic are:
     296 * [[GhcFile(compiler/codeGen/CgCallConv.hs)]] (the section on "Register assignment")
     297 * [[GhcFile(includes/Regs.h)]] (defining STG registers)
     298 * [[GhcFile(includes/MachRegs.h)]] (target-specific mapping of machine registers for ''registerised'' builds of GHC)
     299 * [[GhcFile(rts/PrimOps.cmm)]] (examples of `GlobalReg` register usage for out-of-line primops)
     300All arguments to out-of-line !PrimOps in [[GhcFile(rts/PrimOps.cmm)]] are STG registers.
     301
     302Cmm recognises all C-- syntax with regard to ''hints''.  For example:
     303{{{
     304"signed" bits32 x;  // signed or unsigned int with hint "signed"
     305
     306foreign "C" labelThread(R1 "ptr", R2 "ptr") [];
     307
     308"ptr" info = foreign "C" lockClosure(mvar "ptr") [];
     309
     310}}}
     311Hints are represented in Haskell as `MachHint`s, defined near `MachRep` in [[GhcFile(compiler/cmm/MachOp.hs)]]:
     312{{{
     313data MachHint
     314  = NoHint      -- string: "NoHint"     Cmm syntax: [empty]     (C-- uses "")
     315  | PtrHint     -- string: "PtrHint"    Cmm syntax: "ptr"       (C-- uses "address")
     316  | SignedHint  -- string: "SignedHint" Cmm syntax: "signed"
     317  | FloatHint   -- string: "FloatHint"  Cmm syntax: "float"
     318}}}
     319Although the C-- specification does not allow the C-- type system to statically distinguish between floats, signed ints, unsigned ints or pointers, Cmm does. Cmm `MachRep`s carry the float or int kind of a variable, either within a local block or in a global register.  `GlobalReg` includes separate constructors for `Vanilla`, `Float`, `Double` and `Long`.  Cmm still does not distinguish between signed ints, unsigned ints and pointers (addresses) at the register level, as these are given ''hint'' pseudo-types or their real type is determined as they run through primitive operations.  `MachHint`s still follow the C-- specification and carry kind information as an aide to the backend optimisers. 
     320
     321Global Registers in Cmm currently have a problem with inlining: because neither [[GhcFile(compiler/cmm/PprC.hs)]] nor the NCG are able to keep Global Registers from clashing with C argument passing registers, Cmm expressions that contain Global Registers cannot be inlined.  For more thorough notes on inlining, see the comments in [[GhcFile(compiler/cmm/CmmOpt.hs)]].
     322
     323==== Declaration and Initialisation ====
     324Cmm variables hold the same values registers do in assembly languages but may be declared in a similar way to variables in C.  As in C--, they may actually be declared anywhere in the scope for which they are visible (a block or file)--for Cmm, this is done by the `loopDecls` function in [[GhcFile(compiler/cmm/CmmParse.y)]].  In [[GhcFile(compiler/rts/PrimOps.cmm)]], you will see Cmm variable declarations like this one:
     325{{{
     326W_ w, code, val;  // W_ is a cpp #define for StgWord,
     327                  // a machine word (32 or 64-bit--general register size--unsigned int)
     328}}}
     329Remember that Cmm code is run through the C preprocessor.  `W_` will be transformed into `bits32`, `bits64` or whatever is the `bits`''size'' of the machine word, as defined in [[GhcFile(includes/Cmm.h)]].  In Haskell code, you may use the [[GhcFile(compiler/cmm/MachOp.hs)]] functions `wordRep` and `halfWordRep` to dynamically determine the machine word size.  For a description of word sizes in GHC, see the [wiki:Commentary/Rts/Word Word] page.
     330
     331The variables `w`, `code` and `val` should be real registers. With the above declaration the variables are uninitialised.  Initialisation requires an assignment ''statement''.  Cmm does not recognise C-- "`{` ''literal'', ... `}`" initialisation syntax, such as `bits32{10}` or `bits32[3] {1, 2, 3}`.  Cmm does recognise initialisation with a literal:
     332{{{
     333string_name:    bits8[] "twenty character string\n\0";
     334
     335variable_num:   bits32 10::bits32;
     336}}}
     337The typical method seems to be to declare variables and then initialise them just before their first use.  (Remember that you may declare a variable anywhere in a procedure and use it in an expression before it is initialised but you must initialise it before using it anywhere else--statements, for example.)
     338
     339==== Memory Access ====
     340If the value in `w` were the address of a memory location, you would obtain the value at that location similar to Intel assembler syntax.  In Cmm, you would write:
     341{{{
     342code = W_[w];  // code is assigned the W_ value at memory address w
     343}}}
     344compare the above statement to indirect addressing in Intel assembler:
     345{{{
     346mov     al, [eax]  ; move value in memory at indirect address in register eax,
     347                   ; into register al
     348}}}
     349
     350The code between the brackets (`w` in `[w]`, above) is an ''expression''.  See the [wiki:Commentary/Compiler/CmmType#Expressions Expressions] section.  For now, consider the similarity between the Cmm-version of indexed memory addressing syntax, here:
     351{{{
     352R1 = bits32[R2 + R3];   // R2 (memory address), R3 (index, offset), result: type bits32
     353
     354// note: in Cmm 'R2' and 'R3' would be parsed as global registers
     355// this is generally bad form; instead,
     356// declare a local variable and initialise it with a global, such as:
     357bits32 adr, ofs, res;
     358adr = R2;
     359ofs = R3;
     360res = bits32[adr + ofs];
     361R1 = res;
     362
     363// using local variables will give the NCG some leeway to avoid clobbering the globals
     364// should you call another procedure somewhere in the same scope
     365}}}
     366and the corresponding Intel assembler indexed memory addressing syntax, here:
     367{{{
     368mov     al, ebx[eax]    ; ebx (base), eax (index)
     369; or
     370mov     al, [ebx + eax]
     371}}}
     372You will generally not see this type of syntax in either handwritten or GHC-produced Cmm code, although it is allowed; it simply shows up in macros.  C-- also allows the `*` (multiplication) operator in addressing expressions, for an approximation of ''scaled'' addressing (`[base * (2^n)]`); for example, `n` (the "scale") must be `0`, `1`, `2` or `4`.  C-- itself would not enforce alignment or limits on the scale.  Cmm, however, could not process it: since the NCG currently outputs GNU Assembler syntax, the Cmm or NCG optimisers would have to reduce `n` in (`* n`) to an absolute address or relative offset, or to an expression using only `+` or `-`.  This is not currently the case and would be difficult to implement where one of the operands to the `*` is a relative address not visible in the code block.  [[GhcFile(includes/Cmm.h)]] defines macros to perform the calculation with a constant.  For example:
     373{{{
     374/* Converting quantities of words to bytes */
     375#define WDS(n) ((n)*SIZEOF_W)  // SIZEOF_W is a constant
     376}}}
     377is used in:
     378{{{
     379#define Sp(n)  W_[Sp + WDS(n)]
     380}}}
     381The function `cmmMachOpFold` in [[GhcFile(compiler/cmm/CmmOpt.hs)]] will reduce the resulting expression `Sp + (n * SIZEOF_W)` to `Sp + N`, where `N` is a constant.  A very large number of macros for accessing STG struct fields and the like are produced by [[GhcFile(includes/mkDerivedConstants.c)]] and output into the file `includes/DerivedConstants.h` when GHC is compiled.
     382
     383Of course, all this also holds true for the reverse (when an assignment is made to a memory address):
     384{{{
     385section "data" {
     386        num_arr: bits32[10];
     387}
     388
     389proc1 {
     390        // ...
     391        bits32[num_arr + (2*3)] = 5::bits32;  // in C: num_arr[(2*3)] = 5;
     392        // ...
     393}
     394}}}
     395or, for an example of a macro from `DerivedConstants.h`:
     396{{{
     397StgAtomicallyFrame_code(frame) = R1;
     398}}}
     399this will be transformed to:
     400{{{
     401REP_StgAtomicallyFrame_code[frame + SIZEOF_StgHeader + OFFSET_StgAtomicallyFrame_code] = R1;
     402// further reduces to (on Darwin PPC arch):
     403I32[frame + SIZEOF_StgHeader + 0] = R1;
     404}}}
     405
     406=== Literals and Labels ===
     407Cmm literals are exactly like C-- literals, including the Haskell-style type syntax, for example: `0x00000001::bits32`.  Cmm literals may be used for initialisation by assignment or in expressions. The `CmmLit` and `CmmStatic` data types, defined in [[GhcFile(compiler/cmm/Cmm.hs)]] together represent Cmm literals, static information and Cmm labels:
     408{{{
     409data CmmLit
     410  = CmmInt Integer  MachRep
     411        -- Interpretation: the 2's complement representation of the value
     412        -- is truncated to the specified size.  This is easier than trying
     413        -- to keep the value within range, because we don't know whether
     414        -- it will be used as a signed or unsigned value (the MachRep doesn't
     415        -- distinguish between signed & unsigned).
     416  | CmmFloat  Rational MachRep
     417  | CmmLabel    CLabel                  -- Address of label
     418  | CmmLabelOff CLabel Int              -- Address of label + byte offset
     419 
     420        -- Due to limitations in the C backend, the following
     421        -- MUST ONLY be used inside the info table indicated by label2
     422        -- (label2 must be the info label), and label1 must be an
     423        -- SRT, a slow entrypoint or a large bitmap (see the Mangler)
     424        -- Don't use it at all unless tablesNextToCode.
     425        -- It is also used inside the NCG when generating
     426        -- position-independent code.
     427  | CmmLabelDiffOff CLabel CLabel Int   -- label1 - label2 + offset
     428}}}
     429Note how the `CmmLit` constructor `CmmInt Integer MachRep` contains sign information in the `Integer`, the representation of the literal itself: this conforms to the C-- specification, where integral literals contain sign information. For an example of a function using `CmmInt` sign information, see `cmmMachOpFold` in [[GhcFile(compiler/cmm/CmmOpt.hs)]], where sign-operations are performed on the `Integer`.
     430
     431The `MachRep` of a literal, such as `CmmInt Integer MachRep` or `CmmFloat Rational MachRep` may not always require the size defined by `MachRep`.  The NCG optimiser, [[GhcFile(compiler/nativeGen/MachCodeGen.hs)]], will test a literal such as `1::bits32` (in Haskell, `CmmInt (1::Integer) I32`) for whether it would fit into the bit-size of Assembler instruction literals on that particular architecture with a function defined in [[GhcFile(compiler/nativeGen/MachRegs.lhs)]], such as `fits16Bits` on the PPC.  If the Integer literal fits, the function `makeImmediate` will truncate it to the specified size if possible and store it in a NCG data type, `Imm`, specifically `Maybe Imm`.  (These are also defined in [[GhcFile(compiler/nativeGen/MachRegs.lhs)]]).
     432
     433The Haskell representation of Cmm separates unchangeable Cmm values into a separate data type, `CmmStatic`, defined in [[GhcFile(compiler/cmm/Cmm.hs)]]:
     434{{{
     435data CmmStatic
     436  = CmmStaticLit CmmLit
     437        -- a literal value, size given by cmmLitRep of the literal.
     438  | CmmUninitialised Int
     439        -- uninitialised data, N bytes long
     440  | CmmAlign Int
     441        -- align to next N-byte boundary (N must be a power of 2).
     442  | CmmDataLabel CLabel
     443        -- label the current position in this section.
     444  | CmmString [Word8]
     445        -- string of 8-bit values only, not zero terminated.
     446}}}
     447Note the `CmmAlign` constructor: this maps to the assembler directive `.align N` to set alignment for a data item (hopefully one you remembered to label).  This is the same as the `align` directive noted in Section 4.5 of the [http://cminusminus.org/extern/man2.pdf C-- specification (PDF)].  In the current implementation of Cmm the `align` directive seems superfluous because [[GhcFile(compiler/nativeGen/PprMach.hs)]] translates `Section`s to assembler with alignment directives corresponding to the target architecture (see [wiki:Commentary/Compiler/CmmType#SectionsandDirectives Sections and Directives], below).
     448
     449==== Labels ====
     450Remember that C--/Cmm names consist of a string where the first character is:
     451 * ASCII alphabetic (uppercase or lowercase);
     452 * an underscore:    `_` ;
     453 * a period:         `.` ;
     454 * a dollar sign:    `$` ; or,
     455 * a commercial at:  `@` .
     456
     457Cmm labels conform to the C-- specification.  C--/Cmm uses labels to refer to memory locations in code--if you use a data directive but do not give it a label, you will have no means of referring to the memory!  For `GlobalReg`s (transformed to assembler `.globl`), labels serve as both symbols and labels (in the assembler meaning of the terms).  The Haskell representation of Cmm Labels is contained in the `CmmLit` data type, see [wiki:Commentary/Compiler/CmmType#Literals Literals] section, above.  Note how Cmm Labels are `CLabel`s with address information.  The `Clabel` data type, defined in [[GhcFile(compiler/cmm/CLabel.hs)]], is used throughout the Compiler for symbol information in binary files.  Here it is:
     458{{{
     459data CLabel
     460  = IdLabel                     -- A family of labels related to the
     461        Name                    -- definition of a particular Id or Con
     462        IdLabelInfo
     463
     464  | DynIdLabel                  -- like IdLabel, but in a separate package,
     465        Name                    -- and might therefore need a dynamic
     466        IdLabelInfo             -- reference.
     467
     468  | CaseLabel                   -- A family of labels related to a particular
     469                                -- case expression.
     470        {-# UNPACK #-} !Unique  -- Unique says which case expression
     471        CaseLabelInfo
     472
     473  | AsmTempLabel
     474        {-# UNPACK #-} !Unique
     475
     476  | StringLitLabel
     477        {-# UNPACK #-} !Unique
     478
     479  | ModuleInitLabel
     480        Module                  -- the module name
     481        String                  -- its "way"
     482        Bool                    -- True <=> is in a different package
     483        -- at some point we might want some kind of version number in
     484        -- the module init label, to guard against compiling modules in
     485        -- the wrong order.  We can't use the interface file version however,
     486        -- because we don't always recompile modules which depend on a module
     487        -- whose version has changed.
     488
     489  | PlainModuleInitLabel        -- without the vesrion & way info
     490        Module
     491        Bool                    -- True <=> is in a different package
     492
     493  | ModuleRegdLabel
     494
     495  | RtsLabel RtsLabelInfo
     496
     497  | ForeignLabel FastString     -- a 'C' (or otherwise foreign) label
     498        (Maybe Int)             -- possible '@n' suffix for stdcall functions
     499                -- When generating C, the '@n' suffix is omitted, but when
     500                -- generating assembler we must add it to the label.
     501        Bool                    -- True <=> is dynamic
     502
     503  | CC_Label  CostCentre
     504  | CCS_Label CostCentreStack
     505
     506      -- Dynamic Linking in the NCG:
     507      -- generated and used inside the NCG only,
     508      -- see compiler/nativeGen/PositionIndependentCode.hs for details.
     509     
     510  | DynamicLinkerLabel DynamicLinkerLabelInfo CLabel
     511        -- special variants of a label used for dynamic linking
     512
     513  | PicBaseLabel                -- a label used as a base for PIC calculations
     514                                -- on some platforms.
     515                                -- It takes the form of a local numeric
     516                                -- assembler label '1'; it is pretty-printed
     517                                -- as 1b, referring to the previous definition
     518                                -- of 1: in the assembler source file.
     519
     520  | DeadStripPreventer CLabel
     521    -- label before an info table to prevent excessive dead-stripping on darwin
     522
     523  | HpcTicksLabel Module       -- Per-module table of tick locations
     524  | HpcModuleNameLabel         -- Per-module name of the module for Hpc
     525
     526  deriving (Eq, Ord)
     527}}}
     528
     529=== Sections and Directives ===
     530The Haskell representation of Cmm Section directives, in [[GhcFile(compiler/cmm/Cmm.hs)]] as the first part of the "Static Data" section, is:
     531{{{
     532data Section
     533  = Text               
     534  | Data               
     535  | ReadOnlyData       
     536  | RelocatableReadOnlyData
     537  | UninitialisedData
     538  | ReadOnlyData16      -- .rodata.cst16 on x86_64, 16-byte aligned
     539  | OtherSection String
     540}}}
     541Cmm supports the following directives, corresponding to the assembler directives pretty-printed by the `pprSectionHeader` function in [[GhcFile(compiler/nativeGen/PprMach.hs)]]:
     542|| '''`Section` Constructor''' || '''Cmm section directive''' || '''Assembler Directive''' ||
     543|| `Text` || `"text"` || `.text` ||
     544|| `Data` || `"data"` || `.data` ||
     545|| `ReadOnlyData` || `"rodata"` || `.rodata`[[BR]](generally; varies by arch,OS) ||
     546|| `RelocatableReadOnlyData` || no parse (GHC internal), output: `"relreadonly"` || `.const_data`[[BR]]`.section .rodata`[[BR]](generally; varies by arch,OS) ||
     547|| `UninitialisedData` || `"bss"`, output: `"uninitialised"` || `.bss` ||
     548|| `ReadOnlyData16` || no parse (GHC internal), output: none || `.const`[[BR]]`.section .rodata`[[BR]](generally; on x86_64:[[BR]]`.section .rodata.cst16`) ||
     549You probably already noticed I omitted the alignment directives (for clarity).  For example, `pprSectionHeader` would pretty-print `ReadOnlyData` as
     550{{{
     551.const
     552.align 2
     553}}}
     554on an i386 with the Darwin OS.  If you are really on the ball you might have noticed that the `PprMach.hs` output of "`.section .data`" and the like is really playing it safe since on most OS's, using GNU Assembler, the `.data` directive is equivalent to `.section __DATA .data`, or simply `.section .data`.  Note that `OtherSection String` is not a catch-all for the Cmm parser.  If you wrote:
     555{{{
     556section ".const\n.align 2\n\t.section .rodata" { ... }
     557}}}
     558The Cmm parser (through GHC) would panic, complaining, "`PprMach.pprSectionHeader: unknown section`." 
     559
     560While the C-- specification allows a bare `data` keyword directive, Cmm does not:
     561{{{
     562// this is valid C--, not Cmm!
     563data { }
     564
     565// all Cmm directives use this syntax:
     566section [Cmm section directive] { }
     567}}}
     568
     569Cmm does not recognise the C-- "`stack`" declaration for allocating memory on the system stack. 
     570
     571GHC-produced Cmm code is replete with `data` sections, each of which is stored in `.data` section of the binary code.  This contributes significantly to the large binary size for GHC-compiled code.
     572
     573  ==== Target Directive ====
     574The C-- specification defines a special `target` directive, in section 4.7.  The `target` directive is essentially a code block defining the properties of the target architecture:
     575{{{
     576target
     577        memsize N       // bit-size of the smallest addressable unit of memory
     578        byteorder       [big,little]    // endianness
     579        pointersize     N       // bit-size of the native pointer type
     580        wordsize        N       // bit-size of the native word type
     581}}}
     582This is essentially a custom-coded version of the GNU Assembler (`as`) `.machine` directive, which is essentially the same as passing the `-arch [cpu_type]` option to `as`.
     583
     584Cmm does not support the `target` directive.  This is partly due GHC generally lacking cross-compiler capabilities.  Should GHC move toward adding cross-compilation capabilities, the `target` might not be a bad thing to add.  Target architecture parameters are currently handled through the [wiki:Building/BuildSystem Build System], which partly sets such architectural parameters through [[GhcFile(includes/mkDerivedConstants.c)]] and [[GhcFile(includes/ghcconfig.h)]].
     585
     586=== Expressions ===
     587Expressions in Cmm follow the C-- specification.  They have:
     588 * no side-effects; and,
     589 * one result:
     590   * a ''k''-bit value[[BR]]--these expressions map to the `MachOp` data type, defined in [[GhcFile(compiler/cmm/MachOp.hs)]], see [wiki:Commentary/Compiler/CmmType#OperatorsandPrimitiveOperations Operators and Primitive Operations], the ''k''-bit value may be:
     591     * a Cmm literal (`CmmLit`); or,
     592     * a Cmm variable (`CmmReg`, see [wiki:Commentary/Compiler/CmmType#VariablesRegistersandTypes Variables, Registers and Types]);[[BR]]or,
     593   * a boolean condition.
     594
     595Cmm expressions may include
     596 * a literal or a name (`CmmLit` contains both, see [wiki:Commentary/Compiler/CmmType#LiteralsandLabels Literals and Labels], above);
     597 * a memory reference (`CmmLoad` and `CmmReg`, see [wiki:Commentary/Compiler/CmmType#MemoryAccess Memory Access], above);
     598 * an operator (a `MachOp`, in `CmmMachOp`, below); or,
     599 * another expression (a `[CmmExpr]`, in `CmmMachOp`, below).
     600These are all included as constructors in the `CmmExpr` data type, defined in [[GhcFile(compiler/cmm/Cmm.hs)]]:
     601{{{
     602data CmmExpr
     603  = CmmLit CmmLit               -- Literal or Label (name)
     604  | CmmLoad CmmExpr MachRep     -- Read memory location (memory reference)
     605  | CmmReg CmmReg               -- Contents of register
     606  | CmmMachOp MachOp [CmmExpr]  -- operation (+, -, *, `lt`, etc.)
     607  | CmmRegOff CmmReg Int       
     608}}}
     609Note that `CmmRegOff reg i` is only shorthand for a specific `CmmMachOp` application:
     610{{{
     611CmmMachOp (MO_Add rep) [(CmmReg reg),(CmmLit (CmmInt i rep))]
     612        where rep = cmmRegRep reg
     613}}}
     614The function `cmmRegRep` is described below.  Note: the original comment following `CmmExpr` in [[GhcFile(compiler/cmm/Cmm.hs)]] is erroneous (cf., `mangleIndexTree` in [[GhcFile(compiler/nativeGen/MachCodeGen.hs)]]) but makes the same point described here.  The offset, `(CmmLit (CmmInt i rep))`, is a literal (`CmmLit`), not a name (`CLabel`).  A `CmmExpr` for an offset must be reducible to a `CmmInt` ''in Haskell''; in other words, offsets in Cmm expressions may not be external symbols whose addresses are not resolvable in the current context.
     615
     616Boolean comparisons are not boolean conditions.  Boolean comparisons involve relational operators, such as `>`, `<` and `==`, and map to `MachOp`s that are converted to comparison followed by branch instructions.  For example, `<` would map to `MO_S_Lt` for signed operands, [[GhcFile(compiler/nativeGen/MachCodeGen.hs)]] would transform `MO_S_Lt` into the `LTT` constructor of the `Cond` union data type defined in [[GhcFile(compiler/nativeGen/MachInstrs.hs)]] and [[GhcFile(compiler/nativeGen/PprMach.hs)]] would transform `LTT` to the distinguishing comparison type for an assembler comparison instruction.  You already know that the result of a comparison instruction is actually a change in the state of the Condition Register (CR), so Cmm boolean expressions do have a kind of side-effect but that is to be expected.  In fact, it is necessary since at the least a conditional expression becomes two assembler instructions, in PPC Assembler:
     617{{{
     618cmplwi   r3, 0  ; condition test
     619blt      Lch    ; branch instruction
     620}}}
     621This condition mapping does have an unfortunate consequence: conditional expressions do not fold into single instructions.  In Cmm, as in C--, expressions with relational operators may evaluate to an integral (`0`, nonzero) instead of evaluating to a boolean type.  For certain cases, such as an arithmetic operation immediately followed by a comparison, extended mnemonics such as `addi.` might eliminate the comparison instruction.  See [wiki:Commentary/Compiler/CmmType#CmmDesignObservationsandAreasforPotentialImprovement Cmm Design: Observations and Areas for Potential Improvement] for more discussion and potential solutions to this situation.
     622
     623Boolean conditions include: `&&`, `||`, `!` and parenthetical combinations of boolean conditions.  The `if expr { }` and `if expr { } else { }` statements contain boolean conditions.  The C-- type produced by conditional expressions is `bool`, in Cmm, type `BoolExpr` in [[GhcFile(compiler/cmm/CmmParse.y)]]:
     624{{{
     625data BoolExpr
     626  = BoolExpr `BoolAnd` BoolExpr
     627  | BoolExpr `BoolOr`  BoolExpr
     628  | BoolNot BoolExpr
     629  | BoolTest CmmExpr
     630}}}
     631The type `BoolExpr` maps to the `CmmCondBranch` or `CmmBranch` constructors of type `CmmStmt`, defined in [[GhcFile(compiler/cmm/Cmm.hs)]], see [wiki:Commentary/Compiler/CmmType#StatementsandCalls Statements and Calls].
     632
     633The `CmmExpr` constructor `CmmMachOp MachOp [CmmExpr]` is the core of every operator-based expression; the key here is `MachOp`, which in turn depends on the type of `MachRep` for each operand.  See [wiki:Commentary/Compiler/CmmType#FundamentalandPrimitiveOperators Fundamental and PrimitiveOperators].  In order to process `CmmExpr`s, the data type comes with a deconstructor function to obtain the relevant `MachRep`s, defined in [[GhcFile(compiler/cmm/Cmm.hs)]]:
     634{{{
     635cmmExprRep :: CmmExpr -> MachRep
     636cmmExprRep (CmmLit lit)      = cmmLitRep lit
     637cmmExprRep (CmmLoad _ rep)   = rep
     638cmmExprRep (CmmReg reg)      = cmmRegRep reg
     639cmmExprRep (CmmMachOp op _)  = resultRepOfMachOp op
     640cmmExprRep (CmmRegOff reg _) = cmmRegRep reg
     641}}}
     642The deconstructors `cmmLitRep` and `cmmRegRep` (with its supporting deconstructor `localRegRep`) are also defined in [[GhcFile(compiler/cmm/Cmm.hs)]].
     643
     644In PPC Assembler you might add two 32-bit integrals by:
     645{{{
     646add     r3, r1, r2      ; r3 = r1 + r2
     647}}}
     648while in Cmm you might write:
     649{{{
     650res = first + second;
     651}}}
     652Remember that the assignment operator, `=`, is a statement since it has the "side effect" of modifying the value in `res`.  The `+` expression in the above statement, for a 32-bit architecture, would be represented in Haskell as:
     653{{{
     654CmmMachOp (MO_Add I32) [CmmReg (CmmLocal uniq I32), CmmReg (CmmLocal uniq I32)]
     655}}}
     656The `expr` production rule in the Cmm Parser [[GhcFile(compiler/cmm/CmmParse.y)]] maps tokens to "values", such as `+` to an addition operation, `MO_Add`.  The `mkMachOp` function in the Parser determines the `MachOp` type in `CmmMachOp MachOp [CmmExpr]` from the token value and the `MachRep` type of the `head` variable.  Notice that the simple `+` operator did not contain sign information, only the `MachRep`.  For `expr`, signed and other `MachOps`, see the `machOps` function in [[GhcFile(compiler/cmm/CmmParse.y)]].  Here is a table of operators and the corresponding `MachOp`s recognised by Cmm (listed in order of precedence):
     657|| '''Operator''' || '''`MachOp`''' ||
     658|| `/` || `MO_U_Quot` ||
     659|| `*` || `MO_Mul` ||
     660|| `%` || `MO_U_Rem` ||
     661|| `-` || `MO_Sub` ||
     662|| `+` || `MO_Add` ||
     663|| `>>` || `MO_U_Shr` ||
     664|| `<<` || `MO_Shl` ||
     665|| `&` || `MO_And` ||
     666|| `^` || `MO_Xor` ||
     667|| `|` || `MO_Or` ||
     668|| `>=` || `MO_U_Ge` ||
     669|| `>` || `MO_U_Gt` ||
     670|| `<=` || `MO_U_Le` ||
     671|| `<` || `MO_U_Lt` ||
     672|| `!=` || `MO_Ne` ||
     673|| `==` || `MO_Eq` ||
     674|| `~` || `MO_Not` ||
     675|| `-` || `MO_S_Neg` ||
     676
     677==== Quasi-operator Syntax ====
     678If you read to the end of `expr` in [[GhcFile(compiler/cmm/CmmParse.y)]], you will notice that Cmm expressions also recognise a set of name (not symbol) based operators that would probably be better understood as ''quasi-operators'', listed in the next production rule: `expr0`.  The syntax for these quasi-operators is in some cases similar to syntax for Cmm statements and generally conform to the C-- specification, sections 3.3.2 (`expr`) and 7.4.1 (syntax of primitive operators), ''except that'' 3. ''and, by the equivalence of the two,'' 1. ''may return'' '''multiple''' '' arguments''. In Cmm, quasi-operators may have side effects. The syntax for quasi-operators may be:
     679 1. `expr0` {{{`name`}}} `expr0`[[BR]](just like infix-functions in Haskell);
     680 1. `type[ expression ]`[[BR]](the memory access quasi-expression described in [wiki:Commentary/Compiler/CmmType#MemoryAccess Memory Access]; the Haskell representation of this syntax is `CmmLoad CmmExpr MachRep`);
     681 1. `%name( exprs0 )`[[BR]](standard prefix form, similar to C-- ''statement'' syntax for procedures but with the distinguishing prefix `%`; in Cmm this is ''also used as statement syntax for calls, which are really built-in procedures'', see [wiki:Commentary/Compiler/CmmType#CmmCalls Cmm Calls])
     682A `expr0` may be a literal (`CmmLit`) integral, floating point, string or a `CmmReg` (the production rule `reg`: a `name` for a local register (`LocalReg`) or a `GlobalReg`).
     683
     684Note that the `name` in `expr0` syntax types 1. and 3. must be a known ''primitive'' (primitive operation), see [wiki:Commentary/Compiler/CmmType#OperatorsandPrimitiveOperations Operators and Primitive Operations].  The first and third syntax types are interchangeable:
     685{{{
     686bits32 one, two, res;
     687one = 1::bits32;
     688two = 2::bits32;
     689
     690res = one `lt` two;
     691
     692// is equivalent to:
     693
     694res = %lt(one, two);
     695}}}
     696The primitive operations allowed by Cmm are listed in the `machOps` production rule, in [[GhcFile(compiler/cmm/CmmParse.y)]], and largely correspond to `MachOp` data type constructors, in [[GhcFile(compiler/cmm/MachOp.hs)]], with a few additions.  The primitive operations distinguish between signed, unsigned and floating point types.
     697
     698Cmm adds some expression macros that map to Haskell Cmm functions.  They are listed under `exprMacros` in [[GhcFile(compiler/cmm/CmmParse.y)]] and include:
     699 * `ENTRY_CODE`
     700 * `INFO_PTR`
     701 * `STD_INFO`
     702 * `FUN_INFO`
     703 * `GET_ENTRY`
     704 * `GET_STD_INFO`
     705 * `GET_FUN_INFO`
     706 * `INFO_TYPE`
     707 * `INFO_PTRS`
     708 * `INFO_NPTRS`
     709 * `RET_VEC`
     710
     711=== Statements and Calls ===
     712Cmm Statements generally conform to the C-- specification, with a few exceptions noted below.  Cmm Statements implement:
     713 * no-op; the empty statement: `;`
     714 * C-- (C99/C++ style) comments: `// ... /n` and `/* ... */`
     715 * the assignment operator: `=`
     716 * store operation (assignment to a memory location): `type[expr] =`
     717 * control flow within procedures (`goto`) and between procedures (`jump`, returns) (note: returns are ''only'' Cmm macros)
     718 * foreign calls (`foreign "C" ...`) and calls to Cmm Primitive Operations (`%`)
     719 * procedure calls and tail calls
     720 * conditional statement (`if ... { ... } else { ... }`)
     721 * tabled conditional (`switch`)
     722Cmm does not implement the C-- specification for Spans (sec. 6.1) or Continuations (sec. 6.7).[[BR]]
     723Although Cmm supports primitive operations that may have side effects (see [wiki:Commentary/Compiler/CmmType#PrimitiveOperations Primitive Operations], below), it does not parse the syntax `%%` form mentioned in section 6.3 of the C-- specification.  Use the `%name(arg1,arg2)` expression-syntax instead. 
     724[[BR]]
     725Cmm does not implement the `return` statement (C-- spec, sec. 6.8.2) but provides a set of macros that return a list of tuples of a `CgRep` and a `CmmExpr`: `[(CgRep,CmmExpr)]`.  For a description of `CgRep`, see comments in [[GhcFile(compiler/codeGen/SMRep.lhs)]].  The return macros are defined at the end of the production rule `stmtMacros` in [[GhcFile(compiler/cmm/CmmParse.y)]]:
     726 * `RET_P`
     727 * `RET_N`
     728 * `RET_PP`
     729 * `RET_NN`
     730 * `RET_NP`
     731 * `RET_PPP`
     732 * `RET_NNP`
     733 * `RET_NNNP`
     734 * `RET_NPNP`
     735In the above macros, `P` stands for `PtrArg` and `N` stands for `NonPtrArg`; both are `CgRep` constructors.  These return macros provide greater control for the [wiki:Commentary/Compiler/CodeGen CodeGen] and integrate with the RTS but limit the number and type of return arguments in Cmm: you may only return according to these macros!  The returns are processed by the `emitRetUT` function in [[GhcFile(compiler/cmm/CmmParse.y)]], which in turn calls several functions from [[GhcFile(compiler/codeGen/CgMonad.lhs)]], notably `emitStmts`, which is the core Code Generator function for emitting `CmmStmt` data.
     736
     737The Haskell representation of Cmm Statements is the data type `CmmStmt`, defined in [[GhcFile(compiler/cmm/Cmm.hs)]]:
     738{{{
     739data CmmStmt
     740  = CmmNop
     741  | CmmComment FastString
     742
     743  | CmmAssign CmmReg CmmExpr     -- Assign to register
     744
     745  | CmmStore CmmExpr CmmExpr     -- Assign to memory location.  Size is
     746                                 -- given by cmmExprRep of the rhs.
     747
     748  | CmmCall                      -- A foreign call, with
     749     CmmCallTarget
     750     [(CmmReg,MachHint)]         -- zero or more results
     751     [(CmmExpr,MachHint)]        -- zero or more arguments
     752     (Maybe [GlobalReg])         -- Global regs that may need to be saved
     753                                 -- if they will be clobbered by the call.
     754                                 -- Nothing <=> save *all* globals that
     755                                 -- might be clobbered.
     756
     757  | CmmBranch BlockId             -- branch to another BB in this fn
     758
     759  | CmmCondBranch CmmExpr BlockId -- conditional branch
     760
     761  | CmmSwitch CmmExpr [Maybe BlockId]   -- Table branch
     762        -- The scrutinee is zero-based;
     763        --      zero -> first block
     764        --      one  -> second block etc
     765        -- Undefined outside range, and when there's a Nothing
     766
     767  | CmmJump CmmExpr [LocalReg]    -- Jump to another function, with these
     768                                  -- parameters.
     769}}}
     770Note how the constructor `CmmJump` contains `[LocalReg]`: this is the Cmm implementation of the C-- `jump` statement for calling another procedure where the parameters are the arguments passed to the other procedure. None of the parameters contain the address--in assembler, a label--of the caller, to return control to the caller.  The `CmmCall` constructor also lacks a parameter to store the caller's address.  Cmm implements C-- jump nesting and matching returns by ''tail calls'', as described in section 6.8 of the C-- specification.  Tail calls are managed through the [wiki:Commentary/Compiler/CodeGen CodeGen], see [[GhcFile(compiler/codeGen/CgTailCall.lhs)]].  You may have already noticed that the call target of the `CmmJump` is a `CmmExpr`: this is the Cmm implementation of computed procedure addresses, for example:
     771{{{
     772proc1 {
     773...
     774
     775 jump (f + 1)( ... );
     776
     777}
     778}}}
     779Remember that the computed procedure address, `(f + 1)`, is the memory location of a procedure name (assembler label); it is not meant to obtain the address of a code block ''within'' a procedure, as an alternative way of computing a ''continuation''. 
     780
     781`CmmBranch BlockId` represents an unconditional branch to another [wiki:Commentary/Compiler/CmmType#BasicBlocksandProcedures Basic Block] in the same procedure.  There are two unconditional branches in Cmm/C--:
     782 1. `goto` statement; and
     783 1. a branch from the `else` portion of an `if-then-else` statement.
     784
     785`CmmCondBranch CmmExpr BlockId` represents a conditional branch to another [wiki:Commentary/Compiler/CmmType#BasicBlocksandProcedures Basic Block] in the same procedure.  This is the `if expr` statement where `expr` is a `CmmExpr`, used in both the unary `if` and `if-then-else` statements.  `CmmCondBranch` maps to more complex Assembler instruction sets or HC code ([[GhcFile(compiler/cmm/PprC.hs)]]).  For assembler, labels are created for each new Basic Block.  During parsing, conditional statements map to the `BoolExpr` data type which guides the encoding of assembler instruction sets.
     786
     787`CmmSwitch` represents the `switch` statement.  It is parsed and created as with the `doSwitch` function in [[GhcFile(compiler/cmm/CmmParse.y)]] or created from `case` expressions with the `emitSwitch` and `mk_switch` functions in [[GhcFile(compiler/codeGen/CgUtils.hs)]].  In the NCG, a `CmmSwitch` is generated as a jump table using the `genSwitch` function in [[GhcFile(compiler/nativeGen/MachCodeGen.hs)]].  There is currently no implementation of any optimisations, such as a cascade of comparisons for switches with a wide deviation in values or binary search for very wide value ranges--for output to HC, earlier versions of GCC could not handle large if-trees, anyway.
     788
     789==== Cmm Calls ====
     790Cmm calls include both calls to foreign functions and calls to Cmm quasi-operators using expression syntax (see [wiki:Commentary/Compiler/CmmType#QuasioperatorSyntax Quasi-operator Syntax]). Although Cmm does not implement any of the control flow statements of C-- specification (section 6.8.1), foreign calls are one of the most complex components of Cmm due to complexity of the various calling conventions.
     791
     792The data type, `CmmCallTarget` is defined in [[GhcFile(compiler/cmm/Cmm.hs)]] as:
     793{{{
     794data CmmCallTarget
     795  = CmmForeignCall              -- Call to a foreign function
     796        CmmExpr                 -- literal label <=> static call
     797                                -- other expression <=> dynamic call
     798        CCallConv               -- The calling convention
     799
     800  | CmmPrim                     -- Call to a "primitive" (eg. sin, cos)
     801        CallishMachOp           -- These might be implemented as inline
     802                                -- code by the backend.
     803}}}
     804`CCallConv` is defined in [[GhcFile(compiler/prelude/ForeignCall.lhs)]]; for information on register assignments, see comments in [[GhcFile(compiler/codeGen/CgCallConv.hs)]].
     805
     806`CallishMachOp` is defined in [[GhcFile(compiler/cmm/MachOp.hs)]]; see, also, below [wiki:Commentary/Compiler/CmmType#PrimitiveOperations Primitive Operations].  `CallishMachOp`s are generally used for floating point computations (without implementing any floating point exceptions).  Here is an example of using a `CallishMachOp` (not yet implemented):
     807{{{
     808  add, carry = %addWithCarry(x, y);
     809}}}
     810
     811
     812=== Operators and Primitive Operations ===
     813Cmm generally conforms to the C-- specification for operators and "primitive operations".  The C-- specification, in section 7.4, refers to both of these as "primitive operations" but there are really two different types:
     814 * ''operators'', as I refer to them, are:
     815   * parseable tokens, such as `+`,`-`,`*` or `/`;
     816   * generally map to a single machine instruction or part of a machine instruction;
     817   * have no side effects; and,
     818   * are represented in Haskell using the `MachOp` data type;
     819 * ''primitive operations'' are special, usually inlined, procedures, represented in Haskell using the `CallishMachOp` data type; primitive operations may have side effects.
     820The `MachOp` and `CallishMachOp` data types are defined in [[GhcFile(compiler/cmm/MachOp.hs)]].
     821
     822==== Operators ====
     823{{{
     824data MachOp
     825
     826  -- Integer operations
     827  = MO_Add    MachRep
     828  | MO_Sub    MachRep
     829  | MO_Eq     MachRep
     830  | MO_Ne     MachRep
     831  | MO_Mul    MachRep           -- low word of multiply
     832  | MO_S_MulMayOflo MachRep     -- nonzero if signed multiply overflows
     833  | MO_S_Quot MachRep           -- signed / (same semantics as IntQuotOp)
     834  | MO_S_Rem  MachRep           -- signed % (same semantics as IntRemOp)
     835  | MO_S_Neg  MachRep           -- unary -
     836  | MO_U_MulMayOflo MachRep     -- nonzero if unsigned multiply overflows
     837  | MO_U_Quot MachRep           -- unsigned / (same semantics as WordQuotOp)
     838  | MO_U_Rem  MachRep           -- unsigned % (same semantics as WordRemOp)
     839
     840  -- Signed comparisons (floating-point comparisons also use these)
     841  | MO_S_Ge MachRep
     842  | MO_S_Le MachRep
     843  | MO_S_Gt MachRep
     844  | MO_S_Lt MachRep
     845
     846  -- Unsigned comparisons
     847  | MO_U_Ge MachRep
     848  | MO_U_Le MachRep
     849  | MO_U_Gt MachRep
     850  | MO_U_Lt MachRep
     851
     852  -- Bitwise operations.  Not all of these may be supported at all sizes,
     853  -- and only integral MachReps are valid.
     854  | MO_And   MachRep
     855  | MO_Or    MachRep
     856  | MO_Xor   MachRep
     857  | MO_Not   MachRep
     858  | MO_Shl   MachRep
     859  | MO_U_Shr MachRep    -- unsigned shift right
     860  | MO_S_Shr MachRep    -- signed shift right
     861
     862  -- Conversions.  Some of these will be NOPs.
     863  -- Floating-point conversions use the signed variant.
     864  | MO_S_Conv MachRep{-from-} MachRep{-to-}     -- signed conversion
     865  | MO_U_Conv MachRep{-from-} MachRep{-to-}     -- unsigned conversion
     866}}}
     867Each `MachOp` generally corresponds to a machine instruction but may have its value precomputed in the Cmm, NCG or HC optimisers. 
     868
     869==== Primitive Operations ====
     870{{{
     871-- These MachOps tend to be implemented by foreign calls in some backends,
     872-- so we separate them out.  In Cmm, these can only occur in a
     873-- statement position, in contrast to an ordinary MachOp which can occur
     874-- anywhere in an expression.
     875data CallishMachOp
     876  = MO_F64_Pwr
     877  | MO_F64_Sin
     878  | MO_F64_Cos
     879  | MO_F64_Tan
     880  | MO_F64_Sinh
     881  | MO_F64_Cosh
     882  | MO_F64_Tanh
     883  | MO_F64_Asin
     884  | MO_F64_Acos
     885  | MO_F64_Atan
     886  | MO_F64_Log
     887  | MO_F64_Exp
     888  | MO_F64_Sqrt
     889  | MO_F32_Pwr
     890  | MO_F32_Sin
     891  | MO_F32_Cos
     892  | MO_F32_Tan
     893  | MO_F32_Sinh
     894  | MO_F32_Cosh
     895  | MO_F32_Tanh
     896  | MO_F32_Asin
     897  | MO_F32_Acos
     898  | MO_F32_Atan
     899  | MO_F32_Log
     900  | MO_F32_Exp
     901  | MO_F32_Sqrt
     902  | MO_WriteBarrier
     903}}}
     904
     905== Cmm Design: Observations and Areas for Potential Improvement  ==
     906
     907"If the application of a primitive operator causes a system exception, such as division by zero, this is an unchecked run-time error. (A future version of this specification may provide a way for a program to recover from such an exception.)" C-- spec, Section 7.4.  Cmm may be able to implement a partial solution to this problem, following the paper: [http://cminusminus.org/abstracts/c--pldi-00.html A Single Intermediate Language That Supports Multiple Implementations of Exceptions (2000)].  (TODO: write notes to wiki and test fix.)
     908
     909The IEEE 754 specification for floating point numbers defines exceptions for certain floating point operations, including:
     910 * range violation (overflow, underflow);
     911 * rounding errors (inexact);
     912 * invalid operation (invalid operand, such as comparison with a `NaN` value, the square root of a negative number or division of zero by zero); and,
     913 * zero divide (a special case of an invalid operation). 
     914Many architectures support floating point exceptions by including a special register as an addition to other exception handling registers.  The IBM PPC includes the `FPSCR` ("Floating Point Status Control Register"); the Intel x86 processors use the `MXCSR` register.  When the PPC performs a floating point operation it checks for possible errors and sets the `FPSCR`.  Some processors allow a flag in the Foating-Point Unit (FPU) status and control register to be set that will disable some exceptions or the entire FPU exception handling facility.  Some processors disable the FPU after an exception has occurred while others, notably Intel's x86 and x87 processors, continue to perform FPU operations.  Depending on whether quiet !NaNs (QNaNs) or signaling !NaNs (SNaNs) are used by the software, an FPU exception may signal an interrupt for the software to pass to its own exception handler. 
     915
     916Some higher level languages provide facilities to handle these exceptions, including Ada, Fortran (F90 and later), C++ and C (C99, fenv.h, float.h on certain compilers); others may handle such exceptions without exposing a low-level interface.  There are three reasons to handle FPU exceptions, and these reasons apply similarly to other exceptions:
     917 * the facilities provide greater control;
     918 * the facilities are efficient--more efficient than a higher-level software solution; and,
     919 * FPU exceptions may be unavoidable, especially if several FPU operations are serially performed at the machine level so the higher level software has no opportunity to check the results in between operations.
     920
     921The C-- Language Specification mentions over 75 primitive operators.  The Specification lists separate operators for integral and floating point (signed) arithmetic (including carry, borrow and overflow checking), logical comparisons and conversions (from one size float to another, from float to integral and vice versa, etc.).  C-- also includes special operators for floating point number values, such as `NaN`, `mzero`''k'' and `pzero`''k'', and rounding modes; integral kinds also include bitwise operators, unsigned variants, and bit extraction for width changing and sign or zero-extension.  A C-- implementation may conveniently map each of these operators to a machine instruction, or to a simulated operation on architectures that do not support a single instruction.  There seem to be two main problems with the current GHC-implementation of Cmm:
     922 1. not enough operators
     923 1. no implementation of vector (SIMD) registers (though there is a `I128` `MachRep`)
     924
     925If a particular architecture supports it, assembler includes instructions such as mnemonics with the `.` ("dot") suffix (`add., fsub.`), which set the Condition Register (CR) thereby saving you at least one instruction.  (Extended mnemonics can save you even more.)  Extended mnemonics with side effects may be implemented as new `CallishMachOps`, see [wiki:Commentary/Compiler/CmmType#PrimitiveOperations Primitive Operations] and [wiki:Commentary/Compiler/CmmType#CmmCalls Cmm Calls].  Assembler also supports machine exceptions, especially exceptions for floating-point operations, invalid storage access or misalignment (effective address alignment).  The current implementation of Cmm cannot model such exceptions through flow control because no flow control is implemented, see [wiki:Commentary/Compiler/CmmType#CmmCalls Cmm Calls].
     926
     927Hiding the kinds of registers on a machine eliminates the ability to handle floating point exceptions at the Cmm level and to explicitly vectorize (use SIMD extensions).  The argument for exposing vector types may be a special case since such low-level operations are exposed at the C-level, as new types of variables or "intrinsics," that are C-language extensions provided by special header files and compiler support (`vector unsigned int` or `__m128i`, `vector float` or `__m128`) and operations (`vec_add()`, `+` (with at least one vector operand), `_mm_add_epi32()`). 
     928
     929