Opened 3 years ago

Last modified 10 months ago

#4836 new bug

literate markdown not handled correctly by unlit

Reported by: guest Owned by:
Priority: low Milestone: 7.6.2
Component: Compiler Version: 7.0.1
Keywords: Cc: dagitj@…, jmg@…, trevor@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets: #7120


This simple program in literate haskell, using markdown in the comments gives unlit problems:

### Ok so lets try this again.

### A page that loads and compiles:

> myfact 0 = 1  
> myfact n = n * n-1

Lets see if it works!

If I run unlit and collect the output I can see where it went wrong:

$ ~/lib/ghc-7.0.1/unlit Main.lhs Main.lpp
$ cat Main.lpp
### Ok so lets try this again.

### A page that loads and compiles:

  myfact 0 = 1  
  myfact n = n * n-1

When I look through the source code of unlit.c I think the place to check for this would be here:

    if ( c == '#' ) {
      if ( ignore_shebang ) {
         c1 = egetc(istream);
         if ( c1 == '!' ) {
           while (c=egetc(istream), !isLineTerm(c)) ;
           return SHEBANG;
         myputc(c, ostream);
      if ( leavecpp ) {
        myputc(c, ostream);
        while (c=egetc(istream), !isLineTerm(c))
        return HASH;

It seems that cabal has a similar unlit function:

I haven't tested it but, I think the cabal version would handle this case correctly (or be easier to fix than a C program from 1990). Would it be possible/wise/feasible to extract the cabal version and make it a permanent replacement for the current unlit.c code?

Change History (11)

comment:1 Changed 3 years ago by nalaurethsulfate

In addition to the cabal version perhaps the perl script mentioned in the obscure unlit.c README reference (, lit2stuff) could be called with the correct options to remove the comments from literate Haskell files.

comment:2 Changed 3 years ago by nalaurethsulfate

while it might be easier to fix the cabal program also handles the same test case incorrectly:

GHCi, version 6.12.1: :? for help

Loading package ghc-prim ... linking ... done.

Loading package integer-gmp ... linking ... done.

Loading package base ... linking ... done.

Prelude> :m Distribution.Simple.PreProcess?.Unlit

Prelude Distribution.Simple.PreProcess?.Unlit> f <- readFile "test.lhs"

Prelude Distribution.Simple.PreProcess?.Unlit> f

"### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n> myfact 0

1 \n> myfact n = n * n-1\n\nLets see if it works!\n"

Prelude Distribution.Simple.PreProcess?.Unlit> unlit "log.txt" f

Loading package array- ... linking ... done.

Loading package containers- ... linking ... done.

Loading package filepath- ... linking ... done.

Loading package old-locale- ... linking ... done.

Loading package old-time- ... linking ... done.

Loading package unix- ... linking ... done.

Loading package directory- ... linking ... done.

Loading package pretty- ... linking ... done.

Loading package process- ... linking ... done.

Loading package Cabal- ... linking ... done.

Left "### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n myfact 0 = 1 \n myfact n = n * n-1\n\n -- Lets see if it works!\n\n"

Prelude Distribution.Simple.PreProcess?.Unlit>

I don't think that this is terribly surprising though, and shouldn't be to difficult to fix. If someone could please explain why CPP lines wouldn't be in code blocks (no matter how they are delimited) that would help a lot.

comment:3 Changed 3 years ago by duncan

So the problem here is that ghc does unlit before cpp and so it has to pass the #cpp directives through. It has to do unlit before cpp because in the worst case the only time ghc finds out cpp is needed is when it encounters a {-# LANGUAGE CPP #-} pragma.

In principle I suppose that ghc could unlit with cpp passthrough only for the pass where it reads the module head to find pragmas, and then if cpp is not required to re-unlit the file without the cpp passthrough mode.

Technically this probably does count as H98 non-compliance. The CPP extension interferes with the use of # in ordinary (non-cpp) lhs files.

comment:4 Changed 3 years ago by simonmar

See also #4073 and #3719

Thanks Duncan for pointing out one good reason why we need to do unlit before CPP.

comment:5 Changed 3 years ago by igloo

  • Milestone set to 7.2.1

comment:6 Changed 2 years ago by jmg

  • Cc jmg@… added

comment:7 Changed 2 years ago by jmg

I've run into this problem when trying to use org-mode markup in a lhs file. This bug prevents me from using quite a lot of org-mode specific in-file settings. They all start with a '#' in the first column.

comment:8 Changed 2 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1
  • Priority changed from normal to low

comment:9 Changed 21 months ago by holzensp

comment:10 Changed 20 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:11 Changed 10 months ago by elliottt

  • Cc trevor@… added

I've implemented markdown processing in a branch, and just discovered this ticket. Is there a reason that CPP is preserved in the comment part of a literate file? Shouldn't it be that CPP is only preserved when it shows up in either a birdtrack or \begin{code} ... \end{code} block? Preserving it in the comment portion of a literate program seems akin to processing CPP that is in the comments of a non-literate file; I would normally expect commented-out CPP to not be run.

Note: See TracTickets for help on using tickets.