Abstract FilePath Proposal (AFPP)

Herbert Valerio Riedel, Neil Mitchell & Michael Snoyman


The Haskell2010 Prelude defines

type FilePath = String

This representation has several shortcomings:

  • Inefficient in both time and space due to [Char]:
    • On 64bit GHC it typically allocates 24 bytes per FilePath-character.
    • Heap fragmentation as each character is represented by a separate heap object.
    • Manipulating FilePath values involves lots of costly pointer chasing.
  • Requires conversions/allocations for each system-call (e.g. POSIX this requires conversion to/from CString) as well as validations.
  • Round-tripping issues, where files have their underlying encoding lost on conversion to String.
  • Not type-safe by being a synonym of String.

One approach would be to create a new FilePath type, while leaving type FilePath = String in place. However, since the Prelude.FilePath type is ubiquitously used in many APIs, this would either amount to duplicate APIs or require conversions between the old and the new FilePath type.


Instead, we propose to phase out type FilePath = String in 3 phases:

Phase 1
Add type forward-compat conversion functions toFilePath & fromFilePath to the standard System.IO module, as that module is the official home of the FilePath type in the Haskell Report.

Also add these conversion functions for forward-compatibility to the filepath package:

module System.FilePath where

#if MIN_VERSION_base(4,9,0)
-- re-export {to,from}FilePath from System.IO
-- | Total Unicode-friendly encoding
toFilePath :: String -> FilePath
toFilePath = id

fromFilePath :: FilePath -> String
fromFilePath = id
Phase 2
Have GHC warn when a String-value is used where the FilePath synonym is expected

TODO needs investigation if it's feasible to implement

Phase 3
Make FilePath abstract in base, i.e. make FilePath into a newtype/data but don't export its constructor via non-internal modules. Prelude and System.IO would continue to export the type FilePath. System.IO will continue to export toFilePath/fromFilePath.

Since the internal details are not exposed, they can be changed later, but we envisage:

-- | Internal module exposing internals needed for add-on packages to
-- provide efficient operations involving `FilePath`s
module GHC.FilePath(FilePath(..),...) where

-- Using unpinned bytearrays to avoid Heap fragmentation and
-- which are reasonably cheap to pass to FFI calls
-- wrapped with typeclass-friendly types allowing to avoid CPP
-- Note that, while unpinned bytearrays incur a memcpy on each
-- FFI call, this overhead is generally much preferable to
-- the memory fragmentation of pinned bytearrays
data WindowsFilePath = WFP ByteArray# -- UTF16 data
data PosixFilePath   = PFP ByteArray# -- char[] data as passed to syscalls

#ifdef WINDOWS
type PlatformFilePath = WindowsFilePath
#elif POSIX
type PlatformFilePath = PosixFilePath
# error "no filepath representation available for this platform yet"

-- | Type representing filenames/pathnames
newtype FilePath = FilePath PlatformFilePath -- constructor not exported from Prelude

instance IsString FilePath where 
    fromString = toFilePath

-- | \"String-Concatenation\" for 'FilePaths'
-- This allows to write forward-compatible code for Haskell2010 'FilePath`s
-- E.g. code can be written (assuming `-XOverloadedStrings`) like
-- > tarfname = basedir </> "ghc-" <> ver <> "~" <> gitid <.> "tar.xz"
-- That has the same semantics with pre-AFPP and post-AFPP 'FilePath's
-- NB: 'mappend' is *not* the same as '(</>)', but rather matches the semantics for pre-AFPP 'FilePaths'
instance Monoid FilePath where 
    mempty      = <...>
    mappend a b = <...string-concat...>

Throughout the process, the System.FilePath module in the hackage:filepath package would continue to expose the same API as now, changing with the internal FilePath definition as necessary.

Issues not addressed by this Proposal

  • This proposal does not define the platform/implementation-specific representation of the new FilePath type.

Decisions assumed by this Proposal

  • This propsoal assumes there is no type-level distinction of path-subtypes, e.g. relative/absolute or directory/file.
Last modified 20 months ago Last modified on Jun 26, 2015 3:44:36 PM