Opened 12 months ago

Closed 12 months ago

Last modified 12 months ago

#8945 closed bug (fixed)

GHC produces grouped declarations in a weird order

Reported by: Fuuzetsu Owned by:
Priority: normal Milestone:
Component: GHC API Version: 7.9
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:


Consider very simple module like

module J where

class A a where
  f, g, h, i :: a -> ()

Now when we ask GHC about declarations in the module and extract function signatures, we don't get f, g, h, i :: a -> () as we could expect. We instead get f, i, h, g :: a -> (). The pattern is that the first name is in a correct position and the rest is reversed. This leads to whatever uses the API get the names in order different than that in the source file. See for an example when this matters.

I have prepared an example using GHC API which you can run on your machine with your own test files and see the results for yourselves.

-- GHC.Paths requires the very small ghc-paths package.
-- if you don't want it, libdir = ghc --print-libdir
module Main where

import Control.Monad (ap, liftM2)
import Data.Functor ((<$>))
import System.Environment (getArgs)

import Digraph (flattenSCCs)
import GHC
import GHC.Paths (libdir)
import Outputable (text, ppr, showSDoc, (<>), (<+>))

main :: IO ()
main = do
  (dfs, modules) <- getArgs >>= withGhc Nothing . processModules
  let r = map (showSDoc dfs . (\(x,y) -> text x <> text ":" <+> ppr y)  . f)
  putStrLn $ unlines r
    f (s, t) = (ms_hspp_file s, (\(x,_,_,_) -> x) <$> tm_renamed_source t)

type ModuleName' = String

withGhc :: Maybe DynFlags -> Ghc a -> IO (DynFlags, a)
withGhc d act =
  runGhc (Just libdir) $ do
    dynflags <- case d of
      Nothing -> getSessionDynFlags
      Just d' -> return d'

    _ <- setSessionDynFlags dynflags
    liftM2 (,) getSessionDynFlags act

processModules :: [ModuleName']
               -> Ghc [(ModSummary, TypecheckedModule)]
processModules modules = do
  mg <- depAnalysis
  let sortedMods = flattenSCCs $ topSortModuleGraph False mg Nothing
  mapM (\x -> return (\y -> (x,y)) `ap`
              (parseModule x >>= typecheckModule >>= loadModule))
    depAnalysis :: Ghc ModuleGraph
    depAnalysis = do
      targets <- mapM (\f -> guessTarget f Nothing) modules
      setTargets targets
      depanal [] False

In this case, using the tiny module I posted at the beginning, I get:

*Main> :main "/tmp/J.hs"
/tmp/J.hs: Just class J.A a where
                  J.f, J.i, J.h, J.g :: a -> ()

It looks to me like it's just an oversight somewhere in the API and should be an easy fix for someone familiar with that part. I'd rather save myself many hours trying to find it on my own.

PS: Are the rules for when grouping happens documented somewhere? Grouping functions in class definitions is the only sure-fire way I can get a grouped signature but I'm sure there were others in the past that no longer work.

Change History (6)

comment:1 Changed 12 months ago by haasn

I think the issue might actually be in the parser, more specifically in ghc/compiler/parser/Parser.y.pp:

sig_vars :: { Located [Located RdrName] }
         : sig_vars ',' var             { LL ($3 : unLoc $1) }
         | var                          { L1 [$1] }


sigdecl :: { Located (OrdList (LHsDecl RdrName)) }
        | var ',' sig_vars '::' sigtypedoc
                                { LL $ toOL [ LL $ SigD (TypeSig ($1 : unLoc $3) $5) ] }

sig_vars is returning stuff in the opposite order. Seems to me like sig_vars should be changed to use an OrdList and snoc instead of consing.

comment:2 Changed 12 months ago by simonpj

Spot on. GHC's front end makes no particular effort to maintain the original declarations in the order they were given. While it was just a front end for the compiler that was fine, but now it's being used for Haddock and other back ends I can see that being more careful about this would be good.

I'd be happy to accept a patch. (It's not hard, as you observe.)

There may be other places in the parser where this kind of things happens, I'm not sure.


comment:3 Changed 12 months ago by haasn

  • Status changed from new to patch

Fix located at

Note: I'm not sure how to handle this submodule stuff etc. correctly. Applying this commit will make the Haddock test suite fail, unless we also apply

Would it be correct to push bbcf5d38 to Haddock's master, and then update GHC's master to include 0b51dcae which also changes the commit Haddock points to at the same time, to make sure the test suite is always consistent with the change?

I hope it's fine to just link to github commits here. If you need me to actually attach a patch file, just say so.

Last edited 12 months ago by haasn (previous) (diff)

comment:4 Changed 12 months ago by Simon Peyton Jones <simonpj@…>

In b20bc181d8ba496f866c44ae65d26118c2c502a1/ghc:

Parse the variables in a type signature in the order given (Trac #8945)

This is just making the parser behave more sensibly, and return
the list [x,y,z] from the signature
   x,y,z :: Int
rathe than [x,z,y] as now.

Turns out that the other use of sig_vars *did* do the right
thing already.

comment:5 Changed 12 months ago by simonpj

  • Resolution set to fixed
  • Status changed from patch to closed

Right I've done this, thank you.

I have not updated the Haddock test suite results. Perhaps someone can do that?

Nor have I added a regression test; it seems hardly worth it.


comment:6 Changed 12 months ago by Fuuzetsu

I have updated existing test cases in Haddock and added a new one. I have not touched any Haddock tests which are part of the GHC test suite but it should not be necessary anyway. Now this ticket is completely fixed for both parties.

Thanks for pushing in the patch.

Note: See TracTickets for help on using tickets.