Opened 3 years ago

Last modified 12 months ago

#10068 new task

Make the runtime reflection API for names, modules, locations more systematic

Reported by: simonpj Owned by:
Priority: high Milestone: 8.4.1
Component: Compiler Version: 7.8.4
Keywords: Generics Cc: hvr, ekmett, gridaphobe, j.waldmann
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by simonpj)

Currently in base we have

  • GHC.SrcLoc: the data type SrcLoc contains srcLocPackage and srcLocModule
  • Data.Typeable.Internals: the data type TyCon contains tyConPackage, tyConModule and tyConName.
  • GHC.Generics: the data type Datatype contains dataTypePackage, dataTypeModule and dataTypeName
  • Data.Data: the data type DataType (yes the capitalisation differs!) contains a field tycon :: String, and there are functions tyconModule :: String -> String, and tyconUQname :: String -> String, for parsing that string and extracting the module name and tycon name.
  • GHC.StaticPtr: the data type StaticPtrInfo contains spInfoPackageKey, spInfoModuleName, spInfoName (all Strings). Oh, and spInfoSrcLoc :: (Int,Int) too!
  • GHC.Stack.ccSrcSpan :: Ptr CostCentre -> IO CString stores a source location in a C structure, and returns it as a CString.

This is madness! Six different representations for the same information in one package!

Let's fix this by defining some shared data types, for

  • Module = ModuleName + package
  • Entity = Module + unqualified name

There would be a tiresome changeover period; but Typeable and StaticPtr are in flux anyway.

Would anyone be willing to lead on this?

Change History (20)

comment:1 Changed 3 years ago by hvr

Cc: hvr ekmett added

comment:2 Changed 3 years ago by simonpj

I'd appreciate guidance from the Core Libraries Committee here. I plan to execute the above changes for 7.12, but it'll clearly be a bit disruptive for people Typeable, Data etc.

Simon

comment:3 Changed 3 years ago by simonpj

Description: modified (diff)

comment:4 Changed 3 years ago by gridaphobe

Cc: gridaphobe added

comment:5 Changed 3 years ago by simonpj

See #10311 for why we might want to give further structure to the Package type too. Thus:

  • Module = ModuleName + Package
  • Package lets you get the package name (e.g. containers-0.5.5.1), and the package key (e.g. conta_47ajk3tbda43DFWyeF3oHQ)

comment:6 Changed 3 years ago by j.waldmann

Cc: j.waldmann added

comment:7 Changed 3 years ago by ezyang

See #10279 for another consequence of not having centralized Package structure: users keep using package names when they should use package keys, and the API naming (mkPkgName) doesn't help either!

comment:8 Changed 3 years ago by simonpj

Are you sure? This ticket refers to the runtime reflection API of Typeable, Data etc. Is that what #10279 is about? If so could you give a small example?

Simon

comment:9 Changed 3 years ago by ezyang

I suppose not; it is only related. An example is the NameG constructor:

Prelude Language.Haskell.TH Language.Haskell.TH.Syntax> :t NameG
NameG :: NameSpace -> PkgName -> ModName -> NameFlavour
Prelude Language.Haskell.TH Language.Haskell.TH.Syntax> :info PkgName
newtype PkgName = PkgName String
  	-- Defined in `Language.Haskell.TH.Syntax'
instance Eq PkgName -- Defined in `Language.Haskell.TH.Syntax'
instance Ord PkgName -- Defined in `Language.Haskell.TH.Syntax'
Prelude Language.Haskell.TH Language.Haskell.TH.Syntax> :info ModName
newtype ModName = ModName String
  	-- Defined in `Language.Haskell.TH.Syntax'
instance Eq ModName -- Defined in `Language.Haskell.TH.Syntax'
instance Ord ModName -- Defined in `Language.Haskell.TH.Syntax'

So, the suggestion is that with a unified reflection API, it would make sense to make TH also accept this information. But maybe this is just a separate concern!

comment:10 Changed 2 years ago by gridaphobe

GHC.Stack.CostCentre has yet another representation of source locations, as CStrings.

Last edited 2 years ago by gridaphobe (previous) (diff)

comment:11 Changed 2 years ago by simonpj

Description: modified (diff)

Well spotted. Do you feel like helping out with this?!

comment:12 Changed 2 years ago by gridaphobe

I'm afraid I don't have a lot of free cycles at the moment, I just happened to notice the extra representation while working on https://phabricator.haskell.org/D861.

comment:13 Changed 2 years ago by bgamari

I will try to follow this example in my DWARF stack tracing work.

To recap, it seems like we have the following types,

-- | A Haskell module belonging to a package
data Module
moduleName :: Module -> String
modulePackage :: Module -> Package

-- | A Haskell package
data Package
packageName :: Package -> String
packageKey :: Package -> PackageKey

-- | A named definition in a Haskell module
data Entity
entityModule :: Entity -> Module
entityName :: Entity -> String

Any preferences on where these should live? GHC.Reflection seems appropriate but perhaps a bit ambiguous given how overloaded "reflection" now is.

In the case of DWARF stacktraces this will require that we are able to produce a Module for any given loaded Haskell module. It seems the easiest way to do this would be to simply inject a binding with a predictable name into each module at compile time.

comment:14 in reply to:  13 Changed 2 years ago by simonpj

In the case of DWARF stacktraces this will require that we are able to produce a Module for any given loaded Haskell module. It seems the easiest way to do this would be to simply inject a binding with a predictable name into each module at compile time.

Yes -- that is exactly what I have done in my wip/T9858-typeable-spj branch. I really want to get that branch committed... let's discuss in our call today.

Simon

comment:15 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:16 Changed 2 years ago by Ben Gamari <ben@…>

In bef2f03e/ghc:

Generate Typeable info at definition sites

This patch implements the idea floated in Trac #9858, namely that we
should generate type-representation information at the data type
declaration site, rather than when solving a Typeable constraint.

However, this turned out quite a bit harder than I expected. I still
think it's the right thing to do, and it's done now, but it was quite
a struggle.

See particularly

 * Note [Grand plan for Typeable] in TcTypeable (which is a new module)
 * Note [The overall promotion story] in DataCon (clarifies existing stuff)

The most painful bit was that to generate Typeable instances (ie
TyConRepName bindings) for every TyCon is tricky for types in ghc-prim
etc:

 * We need to have enough data types around to *define* a TyCon
 * Many of these types are wired-in

Also, to minimise the code generated for each data type, I wanted to
generate pure data, not CAFs with unpackCString# stuff floating about.

Performance
~~~~~~~~~~~
Three perf/compiler tests start to allocate quite a bit more. This isn't
surprising, because they all allocate zillions of data types, with
practically no other code, esp. T1969

 * T3294:   GHC allocates 110% more (filed #11030 to track this)
 * T1969:   GHC allocates 30% more
 * T4801:   GHC allocates 14% more
 * T5321FD: GHC allocates 13% more
 * T783:    GHC allocates 12% more
 * T9675:   GHC allocates 12% more
 * T5642:   GHC allocates 10% more
 * T9961:   GHC allocates 6% more

 * T9203:   Program allocates 54% less

I'm treating this as acceptable. The payoff comes in Typeable-heavy
code.

Remaining to do
~~~~~~~~~~~~~~~

 * I think that "TyCon" and "Module" are over-generic names to use for
   the runtime type representations used in GHC.Typeable. Better might be
   "TrTyCon" and "TrModule". But I have not yet done this

 * Add more info the the "TyCon" e.g. source location where it was
   defined

 * Use the new "Module" type to help with Trac Trac #10068

 * It would be possible to generate TyConRepName (ie Typeable
   instances) selectively rather than all the time. We'd need to persist
   the information in interface files. Lacking a motivating reason I have
   not done this, but it would not be difficult.

Refactoring
~~~~~~~~~~~
As is so often the case, I ended up refactoring more than I intended.
In particular

 * In TyCon, a type *family* (whether type or data) is repesented by a
   FamilyTyCon
     * a algebraic data type (including data/newtype instances) is
       represented by AlgTyCon This wasn't true before; a data family
       was represented as an AlgTyCon. There are some corresponding
       changes in IfaceSyn.

     * Also get rid of the (unhelpfully named) tyConParent.

 * In TyCon define 'Promoted', isomorphic to Maybe, used when things are
   optionally promoted; and use it elsewhere in GHC.

 * Cleanup handling of knownKeyNames

 * Each TyCon, including promoted TyCons, contains its TyConRepName, if
   it has one. This is, in effect, the name of its Typeable instance.

Requires update of the haddock submodule.

Differential Revision: https://phabricator.haskell.org/D757

comment:17 Changed 2 years ago by Ben Gamari <ben@…>

In 91c6b1f5/ghc:

Generate Typeable info at definition sites

This is the second attempt at merging D757.

This patch implements the idea floated in Trac #9858, namely that we
should generate type-representation information at the data type
declaration site, rather than when solving a Typeable constraint.

However, this turned out quite a bit harder than I expected. I still
think it's the right thing to do, and it's done now, but it was quite
a struggle.

See particularly

 * Note [Grand plan for Typeable] in TcTypeable (which is a new module)
 * Note [The overall promotion story] in DataCon (clarifies existing
stuff)

The most painful bit was that to generate Typeable instances (ie
TyConRepName bindings) for every TyCon is tricky for types in ghc-prim
etc:

 * We need to have enough data types around to *define* a TyCon
 * Many of these types are wired-in

Also, to minimise the code generated for each data type, I wanted to
generate pure data, not CAFs with unpackCString# stuff floating about.

Performance
~~~~~~~~~~~
Three perf/compiler tests start to allocate quite a bit more. This isn't
surprising, because they all allocate zillions of data types, with
practically no other code, esp. T1969

 * T1969:    GHC allocates 19% more
 * T4801:    GHC allocates 13% more
 * T5321FD:  GHC allocates 13% more
 * T9675:    GHC allocates 11% more
 * T783:     GHC allocates 11% more
 * T5642:    GHC allocates 10% more

I'm treating this as acceptable. The payoff comes in Typeable-heavy
code.

Remaining to do
~~~~~~~~~~~~~~~

 * I think that "TyCon" and "Module" are over-generic names to use for
   the runtime type representations used in GHC.Typeable. Better might
be
   "TrTyCon" and "TrModule". But I have not yet done this

 * Add more info the the "TyCon" e.g. source location where it was
   defined

 * Use the new "Module" type to help with Trac Trac #10068

 * It would be possible to generate TyConRepName (ie Typeable
   instances) selectively rather than all the time. We'd need to persist
   the information in interface files. Lacking a motivating reason I
have
   not done this, but it would not be difficult.

Refactoring
~~~~~~~~~~~
As is so often the case, I ended up refactoring more than I intended.
In particular

 * In TyCon, a type *family* (whether type or data) is repesented by a
   FamilyTyCon
     * a algebraic data type (including data/newtype instances) is
       represented by AlgTyCon This wasn't true before; a data family
       was represented as an AlgTyCon. There are some corresponding
       changes in IfaceSyn.

     * Also get rid of the (unhelpfully named) tyConParent.

 * In TyCon define 'Promoted', isomorphic to Maybe, used when things are
   optionally promoted; and use it elsewhere in GHC.

 * Cleanup handling of knownKeyNames

 * Each TyCon, including promoted TyCons, contains its TyConRepName, if
   it has one. This is, in effect, the name of its Typeable instance.

Updates haddock submodule

Test Plan: Let Harbormaster validate

Reviewers: austin, hvr, goldfire

Subscribers: goldfire, thomie

Differential Revision: https://phabricator.haskell.org/D1404

GHC Trac Issues: #9858

comment:18 Changed 21 months ago by bgamari

Milestone: 8.0.18.2.1
Type: bugtask

I'm very sorry to say that this won't be happening in 8.0.1. Hopefully we'll be able to make this happen for 8.2.

comment:19 Changed 20 months ago by RyanGlScott

Keywords: Generics added

comment:20 Changed 12 months ago by bgamari

Milestone: 8.2.18.4.1

I was hoping to address this after the Typeable rework (#11011) but it doesn't look like I'll get to it for 8.2.

Note: See TracTickets for help on using tickets.