Changes between Version 14 and Version 15 of SafeHaskell


Ignore:
Timestamp:
Nov 22, 2010 5:32:45 AM (3 years ago)
Author:
David
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SafeHaskell

    v14 v15  
    99 * Module M is made available ''in source form'' to the user constructing A. 
    1010 * The user constructing A compiles M with a new flag, `-XSafe`. 
    11  * If compilation succeeds, A can safely import M, knowing that M cannot cause effects that are not visible in the types of M's functions. 
     11 * If compilation succeeds, A can import M knowing that M cannot cause effects that are not visible in the types of M's functions. 
    1212 * The user constructing A must trust: 
    1313   * GHC, its supporting tools, and 
     
    1717More specifically, there are two parts to the proposed extension: 
    1818 
    19  1. A new GHC option (`-XSafe`) enabling a "Safe" dialect of Haskell in which GHC rejects any source code that might produce unsafe effects. 
     19 1. A new GHC option (`-XSafe`) enabling a "Safe" dialect of Haskell in which GHC rejects any source code that might produce unsafe effects or otherwise subvert the type system. 
    2020 
    2121 2. Another new GHC option (`-XTrustworthy`) indicating that, though a module may invoke unsafe functions internally, the set of exported symbols cannot be used in an unsafe way.  The `-XTrustworthy` option makes a small extension to the syntax of import statements, adding a `safe` keyword: 
     
    2323  impdecl -> `import` [`safe`] [`qualified`] modid [`as` modid] [impspec] 
    2424 
    25 Roughly speaking, import declarations are either safe or unsafe, while modules are classified as either trusted or untrusted.  A safe import declaration causes compilation to fail if the imported module is not trusted.  In the Safe dialect, all import declarations are implicitly safe.  When compiling with `-XTrustworthy`, import declarations with the `safe` keyword are safe, and those without unsafe.  A module can be trusted only if it was compiled with `-XSafe` or `-XTrustworthy`, but there are additional restrictions as discussed below under '''module trust'''. 
     25Roughly speaking, import declarations are either safe or unsafe, while modules are classified as either trusted or untrusted.  A safe import declaration causes compilation to fail if the imported module is not trusted.  In the Safe dialect, all import declarations are implicitly safe.  When compiling with `-XTrustworthy`, import declarations with the `safe` keyword are safe, while those without the keyword are unsafe.  A module can be trusted only if it was compiled with `-XSafe` or `-XTrustworthy`, but there are additional restrictions as discussed below under '''module trust'''. 
    2626 
    2727== Safety Goal == 
    2828 
    29 As long as no module compiled with `-XTrustworthy` contains a vulnerability, the safe dialect's goal is to guarantee the following properties: 
     29As long as no module compiled with `-XTrustworthy` contains a vulnerability, the Safe dialect's goal is to guarantee the following properties: 
    3030 
    3131 * '''Referential transparency.'''  Functions in the Safe dialect must be deterministic.  Moreover, evaluating them should have no side effects, and should not halt the program (except by throwing uncaught exceptions or looping forever). 
     
    3737The Safe dialect is intended to be of use for both normal (trusted) and untrusted code.  Authors of trusted modules may wish to include `{-# LANGUAGE Safe #-}` pragmas to ensure they do not accidentally invoke unsafe actions (directly or indirectly), or to allow other Safe code to import their modules. 
    3838 
    39 Note that the safe dialect does not prevent use of the symbol `IO`. If an untrusted module exports an `IO` action, that action may have arbitrary side effects.  Simply compiling a module with `-XSafe` does not meaningfully restrict the effects of exported `IO` actions. Hence, an application A importing an untrusted but safe module M may safely invoke pure functions from M, but must avoid executing `IO` actions unless some other mechanism has been used to ensure M can only construct `IO` actions that conform to A's security goal. (See `-ultrasafe` below for one such mechanism.) 
     39The safe dialect does not prevent use of the symbol `IO`. If an untrusted module exports an `IO` action, that action may have arbitrary side effects, regardless of the `-XSafe` option.  Hence, while an application A importing a safe but possibly malicious module M may safely invoke pure functions from M, it must avoid executing `IO` actions exported by M unless some other mechanism ensures those actions conform to A's security goal. (See `-ultrasafe` below for one such mechanism.) 
    4040 
    4141== Module trust == 
    4242 
    43 Recall that a safe import of a module M fails unless M is trusted. M is trusted when two conditions hold: 
    44  
    45  1. M is compiled with `-XSafe` or `-XTrustWorthy`, in which case we say M is ''trustable'', and 
     43Recall that a safe import of a module M fails unless M is trusted.  M is trusted when two conditions hold: 
     44 
     45 1. M was compiled with `-XSafe` or `-XTrustWorthy`, in which case we say M is ''trustable'', and 
    4646 
    4747 2. The modules on which M depends (transitively) for safety all reside in packages that the application trusts. 
    4848 
    49 Determining trust requires two modifications to the way GHC manages modules.  First, the interface file format must change to record whether each module is trustable, and if so what its trust dependencies are.  Second, we need run-time options to specify which packages are trusted by an application. 
    50  
    51 We therefore change the interface file format to record for each module M whether or not M is trustable.  If M is trustable, the interface file contains a list of ''trust dependencies'', which are (package, module) pairs on which the current application depends for safety.  The trust dependencies of a module are determined as follows: 
     49Determining trust requires two modifications to the way GHC manages modules.  First, the interface file format must change to record whether each module is trustable, and if so what its trust dependencies are.  Second, we need compiler options to specify which packages are trusted by an application. 
     50 
     51We therefore extend the interface file format to record for each module M whether or not M is trustable.  If M is trustable, the interface file contains a list of ''trust dependencies'', which are (package, module) pairs on which the current application depends for safety.  The trust dependencies of a module are determined as follows: 
    5252 
    5353 * ''If a module is compiled with `-XTrustworthy`'', then its trust dependencies always include itself, plus the union of trust dependencies of all modules imported with the `safe` keyword. 
     
    5757 * If a module was compiled with neither `-XSafe` nor `-XTrustworthy`, or, regardless of these options, if a module contains a `{-# LANGUAGE Untrustworthy #-}` pragma, then the interface file marks the module as not trustable, so there are no trust dependencies. 
    5858 
    59 Currently, in any given run of the compiler, GHC classifies each package as either exposed or hidden.  To incorporate trust, we now allow a second bit specifying whether each package is trusted or untrusted.  This bit will be controllable by two new options to `ghc-pkg`, `trust` and `distrust`, which are analogous to `expose` and `hide`.  (The trust bit is meaningless while a package is hidden, but will stay in the package database for when packages are exposed.) 
     59Currently, in any given run of the compiler, GHC classifies each package as either exposed or hidden.  To incorporate trust, we add a second bit specifying whether each package is trusted or untrusted.  This bit will be controllable by two new options to `ghc-pkg`, `trust` and `distrust`, which are analogous to `expose` and `hide`.  (The trust bit is meaningless while a package is hidden, but will stay in the package database for when the package is exposed.) 
    6060 
    6161On the command line, several new options control which packages are trusted: 
     
    6767 * `-distrust-all-packages` - considers all packages untrusted unless they are explicitly trusted by subsequent command-line options. 
    6868 
    69  * A convenience option `-ultrasafe` is equivalent to `-distrust-all-packages` `-XNoForeignFunctionInterface`. 
     69 * A convenience option `-ultrasafe` is equivalent to `-distrust-all-packages` `-XNoForeignFunctionInterface` `-XNoImplicitPrelude`. 
    7070 
    7171None of these options can be specified or overwritten by `OPTIONS_GHC` pragmas in the Safe dialect. 
     
    7979 * A number of functions can be used to violate safety.  Many of these have names prefixed with `unsafe` (e.g., {{{unsafePerformIO}}}, {{{unsafeIterleaveIO}}}, {{{unsafeCoerce}}} {{{unsafeSTToIO}}}, ...). However, not all unsafe functions fit this pattern.  For instance, {{{inlinePerformIO}}} and {{{fromForeignPtr}}} from the `bytestring` package are unsafe. 
    8080 
    81  * Code that defines hand-crafted instances of {{{Typeable}}} can violate safety by causing {{{typeOf}}} to return identical results on two distinct types, then using {{{cast}}} to coerce between the two unsafely.  ''Are there other classes?  Perhaps {{{Data}}} should also be restricted?  Simon says {{{Ix}}} doesn't need to be protected any more.'' 
     81 * Code that defines hand-crafted instances of {{{Typeable}}} can violate safety by causing {{{typeOf}}} to return identical results on two distinct types, then using {{{cast}}} to coerce between the two unsafely.  ''Are there other classes?  Perhaps {{{Data}}} should also be restricted?  Simon says {{{Ix}}} doesn't need to be protected anymore.'' 
    8282 
    8383 * Certain exposed constructors of otherwise mostly safe data types allow unsafe actions.  For instance, the {{{PS}}} constructor of {{{Data.ByteString.ByteString}}} contains a pointer, offset, and length.  Code that can see the pointer value can act in a non-deterministic way by depending on the address rather than value of a {{{ByteString}}}.  Worse, code that can use {{{PS}}} to construct {{{ByteString}}}s can include bad lengths that will lead to stray pointer references. 
     
    8787 * {{{TemplateHaskell}}} is also particularly dangerous, as it can cause side effects even at compilation time. 
    8888 
    89  * The {{{OverlappingInstances}}} extension can be used to violate semantic consistency, because untrusted code can potentially redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module. 
     89 * The {{{OverlappingInstances}}} extension can be used to violate semantic consistency, because malicious code could redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module. 
    9090 
    9191 * Likewise, {{{RULES}}} and {{{SPECIALIZE}}} pragmas can change the behavior of trusted code in unanticipated ways, violating semantic consistency. 
    9292 
    93  * {{{OPTIONS_GHC}}} is dangerous in unfiltered form, as it could potentially make packages trusted that the invoking user doesn't trust. 
     93 * {{{OPTIONS_GHC}}} is dangerous in unfiltered form.  Among other things, it could use `-trust` to trust packages the invoking user doesn't in fact trust. 
    9494 
    9595 * The {{{StandaloneDeriving}}} extension can be used to violate constructor access control by defining instances of {{{Read}}} and {{{Show}}} to examine and construct data values with inaccessible constructors. 
    9696 
    97  * Similarly, {{{GeneralizedNewtypeDeriving}}} can also violate constructor access control, by allowing untrusted code to manipulate protected data types in ways the data type author did not intend. 
     97 * Similarly, {{{GeneralizedNewtypeDeriving}}} can violate constructor access control, by allowing untrusted code to manipulate protected data types in ways the data type author did not intend. 
    9898 
    9999== Implementation details == 
     
    103103 * {{{GHC.Prim}}} will need to be made (or just kept) unsafe. 
    104104 
    105  * {{{-XSafe}}} should disallow the {{{TemplateHaskell}}}, {{{StandaloneDeriving}}}, {{{GeneralizedNewtypeDeriving}}}, and {{{CPP}}} language extensions, as well as {{{RULES}}} and {{{SPECIALIZE}}} pragmas. 
     105 * {{{-XSafe}}} should disallow the {{{TemplateHaskell}}}, {{{StandaloneDeriving}}}, {{{GeneralizedNewtypeDeriving}}}, and {{{CPP}}} language extensions, as well as the {{{RULES}}} and {{{SPECIALIZE}}} pragmas. 
    106106 
    107107 * Overlapping instance declarations must either all reside in modules compiled without `-XSafe`, or else all reside in the same module.  It violates semantic consistency to allow Safe code to change the instance definition associated with a particular type. 
     
    109109 * {{{OPTIONS_GHC}}} pragmas will have to be filtered.  Some options, (e.g., `-fno-warn-unused-do-bind`) are totally fine, but many others are likely problematic (e.g., {{{-cpp}}}, which provides access to the local file system at compilation time, or {{{-F}}} which allows an arbitrary file to be executed, possibly even one named {{{/afs/}}}... and hence entirely under an attacker's control). 
    110110 
    111  * Libraries will progressively need to be updated to export safe interfaces, which may require moving unsafe functions into separate modules, or adding new {{{ {-# LANGUAGE Safe #-} }}} modules that re-export a safe subset of symbols.  Ideally, most modules in widely-used libraries would eventually contain either {{{ {-# LANGUAGE Safe -#} }}} or {{{ {-# LANGUAGE Trustworthy -#} }}} pragmas, except for internal modules or a few modules exporting unsafe symbols.  Maybe haddock could add some indicator to make it obvious which modules are safe. 
    112  
    113  * The {{{-XTrustworthy}}} command-line option and corresponding pragma do not increase what a module can do--the only effect is to mark the module's interface file as trustable and add the module to its own trust dependency set.  If both {{{-XTrustworthy}}} and {{{-XSafe}}} are supplied, then the language is still restricted to the Safe dialect and any unsafe actions will still cause a compilation error.  A plausible use for both pragmas simultaneously is to prune the list of trusted modules--for instance, if a module imports a bunch of trusted modules but does not use any of their trusted features, or only uses those features in a very limited way.  If the code happens also to be safe, the programmer may want to add {{{-XSafe}}} to catch accidental unsafe actions. 
    114  
    115  * The option `{-# LANGUAGE Untrustworthy -#}` is not incompatible with `{-# LANGUAGE Safe -#}`.  The former causes the interface file to be marked not trustable, while the latter causes the source code to be confined to the safe dialect.  `Untrustworthy` should be used in seemingly safe modules that export constructors that would cause other modules to do unsafe things.  (The `PS` constructor discussed above is an example of a dangerous constructor that could potentially be defined in a module that happily compiles with `-XSafe`.) 
     111 * Libraries will progressively need to be updated to export trustable interfaces, which may require moving unsafe functions into separate modules, or adding new `{-# LANGUAGE Trustworthy #-}` modules that re-export a safe subset of symbols.  Ideally, most modules in widely-used libraries will eventually contain either `{-# LANGUAGE Safe -#}` or `{-# LANGUAGE Trustworthy -#}` pragmas, except for internal modules or a few modules exporting unsafe symbols.  Maybe haddock can add some indicator to make it obvious which modules are trustable and show the trust dependencies. 
     112 
     113 * When `-XSafe` and `-XTrustworthy` are used together, the language is restricted to the Safe dialect.  The effect of `-XTrustworthy` is to change the trust dependency set.  Specifically, the trust dependency set will include the module itself.  However, rather than include the union of trust dependency sets of all imported modules, only dependencies of modules imported with the `safe` keyword are added to the current module's set.  A plausible use for both pragmas simultaneously is to prune the list of trusted modules--for instance, if a module imports a bunch of trusted modules but does not use any of their trusted features, or only uses those features in a very limited way.  If the code happens also to be safe, the programmer may want to add `-XSafe` to catch accidental unsafe actions. 
     114 
     115 * The option `{-# LANGUAGE Untrustworthy -#}` is also not incompatible with `{-# LANGUAGE Safe -#}`.  The former causes the interface file to be marked not trustable, while the latter causes the source code to be confined to the Safe dialect.  `Untrustworthy` should be used in seemingly safe modules that export constructors that would allow other modules to do unsafe things.  (The `PS` constructor discussed above is an example of a dangerous constructor that could potentially be defined in a module that happily compiles with `-XSafe`.) 
     116 
     117== Intended uses == 
     118 
     119We anticipate the Safe dialect and corresponding options being used in several ways. 
     120 
     121=== Enforcing good programming style === 
     122 
     123Over-reliance on magic functions such as `unsafePerformIO` or magic symbols such as `#realWorld` can lead to less elegant Haskell code. The Safe dialect formalizes this notion of magic and prohibits its use.  Thus, people may encourage their collaborators to use the Safe dialect, except when truly necessary, so as to promote better programming style. 
     124 
     125=== Restricted IO monads === 
     126 
     127When defining interfaces for possibly malicious plugin modules, the interface can require the plugin to provide a computation in a monad that allows only restricted IO actions.  For instance, consider defining an interface for a module `Danger` provided by an untrusted programmer.  `Danger` should be allowed to read and write particular files (by name), but should not be able do any other form of IO, even though we don't trust it's author not to try. 
     128 
     129We define the plugin interface so that it requires `Danger` to export a single computation, `Danger.runMe`, of type {{{RIO ()}}}, where `RIO` is a new monad defined as follows: 
     130 
     131{{{ 
     132-- Either or both of the following pragmas would do 
     133{-# LANGUAGE Trustworthy #-} 
     134{-# LANGUAGE Safe #-} 
     135 
     136module RIO (RIO(), runRIO, rioReadFile, rioWriteFile) where 
     137 
     138-- Notice that symbol UnsafeRIO is not exported from this module! 
     139 
     140newtype RIO a = UnsafeRIO { runRIO :: IO a } 
     141 
     142instance Monad RIO where 
     143    return = UnsafeRIO . return 
     144    (UnsafeRIO m) >>= k = UnsafeRIO $ m >>= runRIO . k 
     145 
     146-- Returns True iff access is allowed to file name 
     147pathOK :: FilePath -> IO Bool 
     148pathOK file = {- Implement some policy based on file name -} 
     149 
     150rioReadFile :: FilePath -> RIO String 
     151rioReadFile file = UnsafeRIO $ do 
     152  ok <- pathOK file 
     153  if ok then readFile file else return "" 
     154 
     155rioWriteFile :: FilePath -> String -> RIO () 
     156rioWriteFile file contents = UnsafeRIO $ do 
     157  ok <- pathOK file 
     158  if ok then writeFile file contents else return () 
     159}}} 
     160 
     161We compile `Danger` using the `-XSafe` flag.  `Danger` can import module `RIO` because `RIO` is marked `Trustworthy`.  Thus, `Danger` can make use of the `rioReadFile` and `rioWriteFile` functions to access permitted file names. 
     162 
     163The main application then imports both `RIO` and `Danger`.  To run the plugin, it calls {{{RIO.runRIO Danger.runMe}}} within the IO monad. The application is safe in the knowledge that the only IO to ensue will be to files whose paths were approved by the `pathOK` test.  We are relying on the fact that the type system and constructor privacy prevent `RIO` computations from executing `IO` actions directly.  Only functions with access to privileged symbol `UnsafeRIO` can lift `IO` computations into the `RIO` monad. 
     164 
     165[Note that as shown, RIO could fall victim to TOCTTOU bugs or symbolic links, but the same approach applies to more secure monads.] 
     166 
     167=== Restricted `IO` imports === 
     168 
     169An alternate approach to sandboxing possibly malicious plugins is to allow the code to execute `IO` actions, but to limit the primitive `IO` actions such code can import.  In this case, the plugin module `Danger` must be compiled with `-ultrasafe`.  Moreover, it will import a module such as the following: 
     170 
     171{{{ 
     172{-# LANGUAGE Trustworthy #-} 
     173 
     174module SafeIO (rioReadFile, rioWriteFile 
     175              , module RestrictedPrelude) where 
     176 
     177import RestrictedPrelude -- Subset of Prelude without IO actions 
     178 
     179-- Returns True iff access is allowed to file name 
     180pathOK :: FilePath -> IO Bool 
     181pathOK file = {- Implement some policy based on file name -} 
     182 
     183rioReadFile :: FilePath -> IO String 
     184rioReadFile file = do 
     185  ok <- pathOK file 
     186  if ok then readFile file else return "" 
     187 
     188rioWriteFile :: FilePath -> String -> IO () 
     189rioWriteFile file contents = do 
     190  ok <- pathOK file 
     191  if ok then writeFile file contents else return () 
     192}}} 
     193 
     194In this case, the type of `Danger.runMe` will be `IO ()`.  However, because `-ultrasafe` implies `-distrust-all-packages`, the only modules `Danger` can import are trustable modules whose entire trust dependency set lies in the current package.  Let's say that `SafeIO` and `Danger` are the only two such modules.  We then know that the only IO actions `Danger.runMe` can directly execute are `rioReadFile` and `rioWriteFile`. 
    116195 
    117196== References ==