Opened 4 years ago

Last modified 5 months ago

#7897 new bug

MakeTypeRep fingerprints be proper, robust fingerprints

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.6.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

A TypeRep currently looks like this:

data TypeRep = TypeRep Fingerprint TyCon [TypeRep]
data TyCon = TyCon {
   tyConHash    :: Fingerprint,
   tyConPackage :: String,
   tyConModule  :: String,
   tyConName    :: String }

If two TypeReps have the same fingerprint they should really describe identical types.

But that's not really true today, becuase today the fingerprint for a TyCon is obtained by hashing the name of the type constructor (e.g. base:Data.Maybe.Maybe), but not its structure. To see how this is non-robust, imagine that

module M where
data T = MkT S deriving( Typeable )
data S = S1 Int | S2 Bool deriving( Typeable )

Now I do this:

  • Write a program that costructs a value v::T, and serialises into a file (a) the TypeRep for v, and (b) v itself.
  • Now I alter the data type declaration for S
  • Now I recompile and run the program again, which attempts to read the value back in from the file. It carefully compares TypeReps to be sure that the types are the same... yes, still "M.T".
  • But alas the de-serialisation fails because S has been changed.

What we really want is for the fingerprint in a TypeRep to really be a hash of the definition of T (not just its name), including transitively the fingerprints of all the types mentioned in that definition.

In effect, a TypeRep is a dynamic type check, and it should jolly well be a robust dynamic type check. This might also matter in a Cloud Haskell application with different components upgraded at different times.

As it happens, GHC already computes these fingerprints, to put in interface files. But they aren't used when making the Typeable instances for T. I think it should be.

Change History (7)

comment:1 Changed 4 years ago by dreixel

Just to add two comments as to why this might not have a good cost/benefit ratio:

1) The problem is not easily seen in practice. In Cloud Haskell, all nodes are supposed to run the same binary (currently, at least). And even in the example given, the result is a failed deserialisation (possibly with a sensible runtime failure), not a segfault.

2) This will complicate giving Typeable instances for data families. Right now, since Typeable only depends on the LHS of a data declaration, we can give a Typeable instance as soon as the family is declared; this Typeable instance it will work for all data instances, current and future. If we have to look at the RHS, though, we will need one separate Typeable instance per data family instance.

comment:2 Changed 4 years ago by simonpj

Re data families, see #5863

comment:3 Changed 4 years ago by simonpj

Both Pedro's (dreixel) points above are good ones. The data-family question is particularly problematic; but as things stand it's very simple.

So the status quo,in which TypeReps are essentially compared by name, is looking more attractive. No one is aruging for this change. So I propose to park it for now. But I'll leave the ticket as a placeholder for discussion.

comment:4 Changed 4 years ago by simonmar

I'll point out that the package name contains the version, so in most cases the current scheme is safe. Of course this doesn't help for the main package, but if you're serializing data from a properly packaged library, it's fine. The other problem is that TypeReps will be currently distinguished when it would be safe to equate them.

comment:5 in reply to:  3 Changed 3 years ago by thomie

Milestone:

Replying to simonpj:

So the status quo,in which TypeReps are essentially compared by name, is looking more attractive. No one is aruging for this change. So I propose to park it for now.

comment:6 Changed 5 months ago by cdupont

I have a real use case for this bug in Hint: https://github.com/mvdan/hint/issues/31

Having proper fingerprints would allow to check the run time type representation and prevent the segfault. Is there a workaround? How to check the structure of a type at run-time?

comment:7 Changed 5 months ago by bgamari

We don't currently offer a way to view the runtime representation of a type at runtime (other than I suppose Generic). That being said, we do generate Typeable evidence for promoted data constructors, so we already do much of the work necessary to do so. That being said, it's not clear that this would be a useful enough feature to justify the cost.

As far as making the hash more precise is concerned, the data family issue is imho enough of an argument against rocking the boat without very good reason.

Last edited 5 months ago by bgamari (previous) (diff)
Note: See TracTickets for help on using tickets.