Opened 7 years ago

Closed 6 months ago

Last modified 6 months ago

#1480 closed feature request (fixed)

Template Haskell should allow reification of modules

Reported by: igloo Owned by:
Priority: highest Milestone: 7.8.1
Component: Template Haskell Version: 7.7
Keywords: Cc: Bulat.Ziganshin@…, ezyang@…, v.dijk.bas@…, hackage.haskell.org@…, ddf1991@…, gergely@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Moderate (less than a day)
Test Case: Blocked By:
Blocking: #8398 Related Tickets:

Description

It should be possible to reify a module from a splice. If the module is thisModule then it would only tell you about things above the current splice.

This comes up quite often; for example:

In http://www.haskell.org/pipermail/haskell-cafe/2007-June/027205.html Brent Yorgey wants to get a list of top-level functions names prop_* in order to build a runQuickCheckProps function.

In http://www.haskell.org/pipermail/haskell-cafe/2007-June/026493.html Neil Mitchell wants to get all the data/newtype declarations in a module in order to declare Typeable and Data instances for them.

Attachments (2)

reifmod-ghc.patch (3.6 KB) - added by errge 6 months ago.
patch, GHC part
reifmod-th.patch (4.4 KB) - added by errge 6 months ago.
patch, TH part

Download all attachments as: .zip

Change History (33)

comment:1 Changed 7 years ago by guest

  • Cc Bulat.Ziganshin@… added

comment:2 Changed 7 years ago by igloo

#1444 is also somewhat related.

comment:3 Changed 6 years ago by igloo

  • Milestone changed from 6.8 branch to _|_

comment:4 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:5 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:6 Changed 2 years ago by ezyang

  • Cc ezyang@… added
  • Type of failure set to None/Unknown

Another use-case: creating libraries like lifted-base programmatically.

comment:7 Changed 2 years ago by basvandijk

  • Cc v.dijk.bas@… added

comment:8 Changed 13 months ago by liyang

  • Cc hackage.haskell.org@… added

+1 what ezyang said.

comment:9 Changed 13 months ago by ddfisher

  • Cc ddf1991@… added

I'm working on writing a low-boilerplate FFI from Python to Haskell and would find this very useful as well.

comment:10 Changed 6 months ago by errge

  • Difficulty changed from Unknown to Moderate (less than a day)
  • Milestone changed from _|_ to 7.8.1
  • Status changed from new to patch
  • Version changed from 6.6.1 to 7.7

The attached patch does very little, minimal module reification: only the import list is reified.

Adding exports as requested in the bug report should be handled in a separate patch and review cycle, I think this patch is already big enough.

Also, this infrastructure patch is quite trivial and straightforward, while the incremental export reification as requested in the bug report can be implemented maybe in multiple ways and maybe needs design discussion.

After this gets merged, so we know the API, I'm happy to work on adding the export list as a next step.

comment:11 Changed 6 months ago by errge

  • Cc gergely@… added

comment:12 Changed 6 months ago by errge

  • Blocking 8398 added

comment:13 Changed 6 months ago by errge

  • Priority changed from normal to high

Increasing priority to high: we now have a patch and this is the only patch that still needs to be applied so that we implement the story detailed in http://ghc.haskell.org/trac/ghc/wiki/TemplateHaskell/Annotations.

Changed 6 months ago by errge

patch, GHC part

Changed 6 months ago by errge

patch, TH part

comment:14 Changed 6 months ago by simonpj

OK, I'm happy with API, but not so happy with the implementation (though it may do for now). The issue is this: the mi_usages field of the ModIface refers to each module that contains the definition of something that is used in this module. Suppose I have

	module M where
	  import Big
	  f = p

	module Big where
	  import P( p )
	  import Q( q )

Then the mi_usages field of M will have an entry for P, but not Q, since Q.q isn't referred to in M. (I think this is right, but the comments need improving.)

What you really want here is a single list of the direct imports, so that you can walk the direct-import tree, isn't it?

Currently you are going to get more then you want (because you'll get not only the direct imports, but more besides); and perhaps less than you want (because Q may not have an entry).

I don't think the direct-import information is recorded as-such in the ModIface currently, but it easily could be.

I suppose we could commit the patch as-is (ie perhaps buggy in corner cases), but I want to be sure that you were going to push thorough the Right Thing in due course. As part of that, we need more precise commentary on the fields of HscTypes.Dependencies, TcRnTypes.ImportAvails, and mi_usages of ModIface. For example, I wonder whether the mi_usages field could be part of the Dependencies record. I can help with working this out.

Simon

comment:15 Changed 6 months ago by errge

  • Blocking 8489 added

comment:16 Changed 6 months ago by errge

  • Blocking 8489 removed

Thanks for the review, I'll look into how current usages is handled and what is the best way to fix these issues. I've created another ticket for this: #8489 and assigned it to myself.

In the meantime, by slightly modifying your example, I've confirmed that usages always contains the imports, even if they're not used. So I never get less than what I want. I confirmed this behavior also through the source code via this backtrace:

So for me this patch would already be very useful in 7.8.1, and yes, I'll work on fixing the remaining edge case that you've just showed. If everyone agrees, please apply this as-is.

comment:17 Changed 6 months ago by simonpj

  • Priority changed from high to highest

OK, let's do that. Austin can you go ahead?

Simon

comment:18 Changed 6 months ago by Austin Seipp <austin@…>

In 69fa2e558d56178d33950df815c3233606b0d44e/ghc:

Add support for module reification (#1480)

Authored-by: Gergely Risko <gergely@risko.hu>
Signed-off-by: Austin Seipp <austin@well-typed.com>

comment:19 Changed 6 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from patch to closed

Merged, thanks!

comment:20 Changed 6 months ago by simonmar

Regarding mi_usages, it does contain all the direct imports, and you can tell which entries are direct imports by the usg_exports field, which is Just for a direct import and Nothing otherwise. There are more comments on the Usages type, but feel free to reorganise the comments.

comment:21 Changed 6 months ago by errge

I have to experiment a bit more around this, but at a first glance, I see this in HscTypes.lhs:

-- | Records modules that we depend on by making a direct import from
data Usage
  -- | Module from another package
  = UsagePackageModule {
        usg_mod      :: Module,
           -- ^ External package module depended on
        usg_mod_hash :: Fingerprint,
            -- ^ Cached module fingerprint
        usg_safe :: IsSafeImport
            -- ^ Was this module imported as a safe import
    }
  -- | Module from the current package
  | UsageHomeModule {
        usg_mod_name :: ModuleName,
            -- ^ Name of the module
        usg_mod_hash :: Fingerprint,
            -- ^ Cached module fingerprint
        usg_entities :: [(OccName,Fingerprint)],
            -- ^ Entities we depend on, sorted by occurrence name and fingerprinted.
            -- NB: usages are for parent names only, e.g. type constructors
            -- but not the associated data constructors.
        usg_exports  :: Maybe Fingerprint,
            -- ^ Fingerprint for the export list we used to depend on this module,
            -- if we depend on the export list
        usg_safe :: IsSafeImport
            -- ^ Was this module imported as a safe import
    }                                           -- ^ Module from the current package
  -- | A file upon which the module depends, e.g. a CPP #include, or using TH's
  -- 'addDependentFile'
  | UsageFile {
        usg_file_path  :: FilePath,
        -- ^ External file dependency. From a CPP #include or TH
        -- addDependentFile. Should be absolute.
        usg_file_hash  :: Fingerprint
        -- ^ 'Fingerprint' of the file contents.

        -- Note: We don't consider things like modification timestamps
        -- here, because there's no reason to recompile if the actual
        -- contents don't change.  This previously lead to odd
        -- recompilation behaviors; see #8114
  }
    deriving( Eq )
        -- The export list field is (Just v) if we depend on the export list:
        --      i.e. we imported the module directly, whether or not we
        --           enumerated the things we imported, or just imported
        --           everything
        -- We need to recompile if M's exports change, because
        -- if the import was    import M,       we might now have a name clash
        --                                      in the importing module.
        -- if the import was    import M(x)     M might no longer export x
        -- The only way we don't depend on the export list is if we have
        --                      import M()
        -- And of course, for modules that aren't imported directly we don't
        -- depend on their export lists

As you can see, UsagePackageModule? doesn't have a usg_exports field, only UsageHomeModule?. So I can't decide about directness of the import based on usg_exports alone.

Maybe we should make this info more explicit in Usages if we want to depend on it for TH reification.

comment:22 Changed 6 months ago by simonmar

The source of truth for the design around usages and recompilation is Commentary/Compiler/RecompilationAvoidance.

You're right that UsagePackageModule doesn't have a usg_exports field. For package modules we record a lower resolution dependency, just the hash of the whole module. The rationale is that package modules don't change very often, so we don't need to track fine-grained dependency information about them. This does mean that we can't tell the difference between a direct import and an indirect import of a packages module from the Usages. Probably if you need to get a list of direct imports you should get it from somewhere else, e.g. mg_dir_imps from ModGuts.

comment:23 Changed 6 months ago by simonpj

Gergely wants to be able walk the entire module-import tree, including deep into packages. (Correct?)

So given a module M (presumably including the current one) he'd like to produce a list of all modules that M directly imported. We can get from M to M's ModIface, but then we need to find its direct imports. But if M uses a module X, but X is from a package, we can't tell if X is directly imported by M or not.

I can see two approaches:

  • It would be easy enough to record a boolean flag in UsagePackageModule for this purpose.
  • Alternatively, but less satisfactorily, we could say that reifyModule returns a superset of the direct imports. That's what happens now I think. But that seems yukky.

I prefer the first.

Simon

comment:24 follow-up: Changed 6 months ago by simonpj

Simon, I don't understand the relationship between mi_deps and mi_usages fields of ModIface. Both seem implicated in the recompilation check. Can you clarify?

Simon

comment:25 Changed 6 months ago by errge

Yes, you're correct, I want to reify not just the current module, but any module, even in packages. Therefore ModGuts is out of the question, and we can only depend on ModIface, that is actually serialized to .hi files.

Your first option (to extend the serialization format with the Boolean) seems to be the correct way, that's what I will try first, when I sit down to do it. But first, I want to look around and get a better understanding about the whole recompilation checking stuff. This discussion and the wikipage will help a lot, I hope.

The second option is unsatisfactory: as you pointed out before, it's different between different haskell implementations (or even between different versions of cabal libraries).

comment:26 in reply to: ↑ 24 Changed 6 months ago by simonmar

Replying to simonpj:

Simon, I don't understand the relationship between mi_deps and mi_usages fields of ModIface. Both seem implicated in the recompilation check. Can you clarify?

I've just been poking around in the source tree and the wiki, and I think a concise summary is this: mi_deps lists everything below the module, whereas mi_usages lists the bits that the module actually depends on, and their fingerprints.

mi_usages is used only by the recompilation checker, whereas mi_deps is used by the renamer and other places (linking?), and by the recompilation checker. The mi_usages of modules that we are not compiling are completely redundant, we never even load them - the interface deserializer returns a thunk for that bit of the .hi file, and we never look at it.

There is some overlap between these two fields. As far as I know it's been this way for a long time - since before I implemented fingerprints I think. Perhaps the redundancy could be eliminated... but there are a lot of subtleties here.

comment:27 Changed 6 months ago by simonmar

Now that I think about it, I'm wary about using mi_usages for reification, because that widens its scope beyond the recompilation checker. It seems more correct to add the information to mi_deps.

comment:28 Changed 6 months ago by simonpj

Got it. I'll add comments.

I agree about not using mi_usages for Gergely's purposes. Maybe add dep_dir_imps :: [Module] to Dependencies to record the directly-imported modules?

Simon

comment:29 Changed 6 months ago by simonmar

An alternative is to split dep_mods into two; the direct imports and the rest. That would avoid growing the .hi file.

comment:30 Changed 6 months ago by simonpj

I don't think splitting mi_deps will work, because it lists only the home-package imports, so we would not thereby find the directly imported modules that happen to be in other packages.

comment:31 Changed 6 months ago by simonmar

ah yes, good point.

Note: See TracTickets for help on using tickets.