wiki:Commentary/Compiler/NameType

Version 3 (modified by simonpj, 8 years ago) (diff)

--

[ Up: Compiler/HscMain ]

The Name and OccName types

Every entity (type constructor, class, identifier, type variable) has a Name. The Name type is pervasive in GHC, and is defined in compiler/basicTypes/Name.lhs. Here is what a Name looks like, though it is private to the Name module:

data Name = Name {
	      n_sort :: NameSort,	-- What sort of name it is
	      n_occ  :: !OccName,	-- Its occurrence name
	      n_uniq :: Int#,		-- Its identity
	      n_loc  :: !SrcLoc		-- Definition site
	  }
  • The n_sort field says what sort of name this is: see #NameSort below.
  • The n_occ field gives the "occurrence name" of the Name; see OccName? below.
  • The n_uniq field allows fast tests for equality of Names.
  • The n_loc field gives some indication of where the name was bound.

The NameSort of a Name

There are four flavours of Name:

data NameSort
  = External Module (Maybe Name)
	-- (Just parent) => this Name is a subordinate name of 'parent'
	-- e.g. data constructor of a data type, method of a class
	-- Nothing => not a subordinate
 
  | WiredIn Module (Maybe Name) TyThing BuiltInSyntax
	-- A variant of External, for wired-in things

  | Internal		-- A user-defined Id or TyVar
			-- defined in the module being compiled

  | System		-- A system-defined Id or TyVar.  Typically the
			-- OccName is very uninformative (like 's')
Internal, System
An Internal Name has only an occurrence name. Distinct Internal Names may have the same occurrence name; the n_uniq distinguishes them.

There is only a tiny difference between Internal and System; the former simply remembers that the name was originally written by the programmer, which helps when generating error messages.

External
An External Name has a globally-unique (module, occurrence name) pair, namely the original name of the entity, that describes where the thing was originally defined. So for example, if we have
module M where
  f = e1
  g = e2

module A where
  import qualified M as Q
  import M
  a = Q.f + g
then in module A}, the function Q.f has an External Name M.f.

During any invocation of GHC, each (module, occurrence-name) gets one, and only one, Unique, stored in the n_uniq} field of the Name. This assoication remains fixed even when GHC finishes one module and starts to compile another. This association between (module, occurrence-name) pairs and the corresponding Name (with its n_uniq field) is maintained by the !Name !Cache.

WiredIn
A WiredIn Name is a special sort of External Name, one that is completely known to the compiler (e.g. the Bool type constructor). In this case the Name contains the TyThing that it is bound to; no need for lookups here!

The BuiltInSyntax field is just a boolean yes/no flag that identifies entities that are denoted by built-in syntax, such as [] for the empty list. These Names aren't "in scope" as such, and we occasionally need to know that.

Entities and Names

Here are the sorts of Name an entity can have:

  • Class, TyCon: Always have External or WiredIn Names.
  • TyVar: can have Internal, or System Names; the former are ones arise from instantiating programmer-written type signatures.
  • Ids: can have External, Internal, or System Names.
    • Before CoreTidy, the Ids that were defined at top level in the original source program get External Names, whereas extra top-level bindings generated (say) by the type checker get Internal Names. This distinction is occasionally useful for filtering diagnostic output; e.g. for -ddump-types.
    • After CoreTidy: An Id with an External Name will generate symbols that appear as external symbols in the object file. An Id with an Internal Name cannot be referenced from outside the module, and so generates a local symbol in the object file. The CoreTidy pass makes the decision about which names should be External and which Internal.

Occurrence names: OccName

An OccName is more-or-less just a string, like "foo" or "Tree", giving the (unqualified) name of an entity. Well, not quite just a string, because in Haskell a name like "C" could mean a type constructor or data constructor, depending on context. So GHC defines a type OccName? (defined in basicTypes/OccName.lhs) that is a pair of a FastString and a NameSpace indicating which name space the name is drawn from. The data type is defined (abstractly) in compiler/basicTypes/OccName.lhs:

data OccName = OccName NameSpace EncodedFS

The EncodedFS is a synonym for FastString indicating that the string is Z-encoded. (Details in compiler/basicTypes/OccName.lhs.) Z-encoding encodes funny characters like '%' and '$' into alphabetic characters, like "zp" and "zd", so that they can be used in object-file symbol tables without confusing linkers and suchlike.

The name spaces are:

data NameSpace = VarName	-- Variables, including "source" data constructors
	       | DataName	-- "Real" data constructors 
	       | TvName		-- Type variables
	       | TcClsName	-- Type constructors and classes; Haskell has them
				-- in the same name space for now.