Opened 20 months ago

Last modified 12 months ago

#8944 new feature request

Warn instead of stopping on misplaced Haddock comments

Reported by: Fuuzetsu Owned by:
Priority: normal Milestone:
Component: Compiler (Parser) Version: 7.9
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D452
Wiki Page:


Given a simple module like

module H where

data F = F () -- ^ Comment for first type argument

and trying to run Haddock on it, we'll get back

/tmp/H.hs:4:12: parse error on input ‘(’

The error message is rather sub-par in this scenario. It'd be great if we could instead print a warning saying that a Haddock comment is unexpected in that position and then continue on parsing.

Best case scenario would be a more informative message such as “Documenting each constructor argument is not supported” but that might be quite a bit harder.

Filing on GHC Trac as it's the parser that needs changing I believe.

Change History (8)

comment:1 Changed 13 months ago by rodlogic

The following break GHC's validate (see

-- * base
import Control.Monad    ( unless, liftM )
import GHC.Exts
import Data.Char
import Control.Monad    ( mplus )

-- * compiler/hsSyn
import HsSyn

I ended up having to remove '*' from the comment for it to work:

-- base
import Control.Monad    ( unless, liftM )
import GHC.Exts
import Data.Char
import Control.Monad    ( mplus )

-- compiler/hsSyn
import HsSyn

This seems to be the relevant part in Lexer.x:

"-- " ~[$docsym \#] .* { lineCommentToken }
"--" [^$symbol \ ] .* { lineCommentToken }

-- Next, match Haddock comments if no -haddock flag

"-- " [$docsym \#] .* / { ifExtension (not . haddockEnabled) } { lineCommentToken }

-- Now, when we've matched comments that begin with 2 dashes and continue
-- with a different character, we need to match comments that begin with three
-- or more dashes (which clearly can't be Haddock comments). We only need to
-- make sure that the first non-dash character isn't a symbol, and munch the
-- rest of the line.

"---"\-* ~$symbol .* { lineCommentToken }

-- Since the previous rules all match dashes followed by at least one
-- character, we also need to match a whole line filled with just dashes.

"--"\-* / { atEOL } { lineCommentToken }

-- We need this rule since none of the other single line comment rules
-- actually match this case.

"-- " / { atEOL } { lineCommentToken }

And it seems that there is no case for when haddockEnabled is true. What should be the action for such rule?

comment:2 follow-up: Changed 13 months ago by rodlogic

I have a fix for this in However, I just realized that I forgot to create a specific option for this warning. But does this even need an option??

comment:3 Changed 13 months ago by thomie

  • Differential Rev(s) set to Phab:D452

comment:4 Changed 13 months ago by goldfire

I think this should be enabled with -Wall, even if haddock is not enabled. That way, when I'm developing, but before trying to release and build docs, I can learn about bad Haddock comments.

comment:5 in reply to: ↑ 2 Changed 13 months ago by rodlogic

Replying to rodlogic:

I have a fix for this in However, I just realized that I forgot to create a specific option for this warning. But does this even need an option??

I did spend some time trying to make the following case work, which is a similar to the one described in the summary:

import Data.Maybe
-- * whatever
import Data.Functor

Parsing the above with the haddock flag turned on will lead to the following error:

a1.hs:3:1: parse error on input ‘import’

When the haddock flag is turned on, the comment line is scanned as a ITdocSection token instead of a ITlineComment. However, instead of being ignored by the lexer an ITdocSection is returned to the parser as a token. The problem is that the import rules are not expecting any documentation tokens and the parsing fails.

So fixing this requires modifying the import rule to be similar to how the export rules are implemented, i.e. explicitly taking into account the different haddock tokens.

I just find it a bit strange that both the parser and lexer need to know about haddock comments. Any changes to haddock (e.g. introducing a new haddock comment type) would require changing the GHC's lexer and the parser! I was expecting to see the lexer to generate a generic single or multiline comment token that would be incorporated into the AST by the parser, and then haddock would be able to further process these comments based on it's own rules.

Later this week I will see if I can address the summary case since it may be simpler, but it seems that it would be a good idea to generalize the parsing of comments and make GHC's parser/lexer a bit more independent of Haddock (in a different ticket). Alanz changes to the AST may be a good first step here and I will take a closer look there too.

comment:6 Changed 12 months ago by rodlogic

After a some time getting my feet wet with the Lexer.x and Parser.y, I am coming to the conclusion that this is not a trivial change and involves more changes to the parser than I would like to or am able to handle at this point. I also noticed that #8822 may also be stuck based on the same/similar difficulty. The difficulty in debugging the parser or the somewhat long change/compile cycles also don't help.

Last edited 12 months ago by rodlogic (previous) (diff)

comment:7 Changed 12 months ago by goldfire

I agree that this is a harder nut to crack than it initially appears.

But, I'm surprised about the long build times. Were you using make 2 in the ghc directory?

comment:8 Changed 12 months ago by rodlogic

I didn't know that make 2 existed, thanks for the tip.

There is one thing that bothers me, though. It seems that it is going to be quite hard to get the parser behaving the same with/without the haddock flag (granted it may not be such an important problem). We'll need to change happy rules everywhere to make it accept the same source files. Wouldn't it make more sense to have comments in general (as opposed to haddock specific documentation attached to specific declarations) as first class AST nodes that can be included/excluded based on a general parse-comments flag? This way a parser plugin could be added to the pipeline, consume the AST with comments, and parse the comments however it wants based on whether it is before or after a declaration. And haddock could be completely independent from the GHC parser.

I understand that this may be a big change that is not worth pursuing at this point in time, but I am curious about what your opinion is.

Note: See TracTickets for help on using tickets.