Opened 8 years ago

# literate markdown not handled correctly by unlit

Reported by: Owned by: guest low Compiler 7.0.1 dagitj@…, jmg@…, trevor@…, bjp@… Unknown/Multiple Unknown/Multiple GHC rejects valid program #7120

### Description

This simple program in literate haskell, using markdown in the comments gives unlit problems:

### Ok so lets try this again.

### A page that loads and compiles:

> myfact 0 = 1
> myfact n = n * n-1

Lets see if it works!


If I run unlit and collect the output I can see where it went wrong:

$~/lib/ghc-7.0.1/unlit Main.lhs Main.lpp$ cat Main.lpp
### Ok so lets try this again.

### A page that loads and compiles:

myfact 0 = 1
myfact n = n * n-1



When I look through the source code of unlit.c I think the place to check for this would be here:

    if ( c == '#' ) {
if ( ignore_shebang ) {
c1 = egetc(istream);
if ( c1 == '!' ) {
while (c=egetc(istream), !isLineTerm(c)) ;
return SHEBANG;
}
myputc(c, ostream);
c=c1;
}
if ( leavecpp ) {
myputc(c, ostream);
while (c=egetc(istream), !isLineTerm(c))
myputc(c,ostream);
myputc('\n',ostream);
return HASH;
}
}


It seems that cabal has a similar unlit function: http://www.haskell.org/ghc/docs/latest/html/libraries/Cabal-1.10.0.0/src/Distribution-Simple-PreProcess-Unlit.html#unlit

I haven't tested it but, I think the cabal version would handle this case correctly (or be easier to fix than a C program from 1990). Would it be possible/wise/feasible to extract the cabal version and make it a permanent replacement for the current unlit.c code?

### comment:1 Changed 8 years ago by nalaurethsulfate

In addition to the cabal version perhaps the perl script mentioned in the obscure unlit.c README reference (http://www.desy.de/user/projects/LitProg/glasgow/programs-and-options.html, lit2stuff) could be called with the correct options to remove the comments from literate Haskell files.

### comment:2 Changed 8 years ago by nalaurethsulfate

while it might be easier to fix the cabal program also handles the same test case incorrectly:

GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help

Prelude> :m Distribution.Simple.PreProcess.Unlit

Prelude Distribution.Simple.PreProcess.Unlit> f <- readFile "test.lhs"

Prelude Distribution.Simple.PreProcess.Unlit> f

"### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n> myfact 0

# 1 \n> myfact n = n * n-1\n\nLets see if it works!\n"

Prelude Distribution.Simple.PreProcess.Unlit> unlit "log.txt" f

Left "### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n myfact 0 = 1 \n myfact n = n * n-1\n\n -- Lets see if it works!\n\n"

Prelude Distribution.Simple.PreProcess.Unlit>

I don't think that this is terribly surprising though, and shouldn't be to difficult to fix. If someone could please explain why CPP lines wouldn't be in code blocks (no matter how they are delimited) that would help a lot.

### comment:3 Changed 8 years ago by duncan

So the problem here is that ghc does unlit before cpp and so it has to pass the #cpp directives through. It has to do unlit before cpp because in the worst case the only time ghc finds out cpp is needed is when it encounters a {-# LANGUAGE CPP #-} pragma.

In principle I suppose that ghc could unlit with cpp passthrough only for the pass where it reads the module head to find pragmas, and then if cpp is not required to re-unlit the file without the cpp passthrough mode.

Technically this probably does count as H98 non-compliance. The CPP extension interferes with the use of # in ordinary (non-cpp) lhs files.

### comment:4 Changed 8 years ago by simonmar

Thanks Duncan for pointing out one good reason why we need to do unlit before CPP.

### comment:5 Changed 8 years ago by igloo

Milestone: → 7.2.1

### comment:7 Changed 7 years ago by jmg

I've run into this problem when trying to use org-mode markup in a lhs file. This bug prevents me from using quite a lot of org-mode specific in-file settings. They all start with a '#' in the first column.

### comment:8 Changed 7 years ago by igloo

Milestone: 7.4.1 → 7.6.1 normal → low

### comment:9 Changed 7 years ago by holzensp

Related Tickets: → #7120

### comment:10 Changed 6 years ago by igloo

Milestone: 7.6.1 → 7.6.2

### comment:11 Changed 6 years ago by elliottt

I've implemented markdown processing in a branch, and just discovered this ticket. Is there a reason that CPP is preserved in the comment part of a literate file? Shouldn't it be that CPP is only preserved when it shows up in either a birdtrack or \begin{code} ... \end{code} block? Preserving it in the comment portion of a literate program seems akin to processing CPP that is in the comments of a non-literate file; I would normally expect commented-out CPP to not be run.

### comment:12 Changed 5 years ago by bjp

Cc: bjp@… added → Unknown

### comment:13 Changed 5 years ago by elliottt

I've updated my branch, and it's building successfully against master. I plan on testing a few corner cases, then submitting a patch. For reference, here's the branch:

### comment:14 Changed 5 years ago by dominic

I have probably misunderstood but I don't think this solves the problem. I have cherry-picked your commits. For the code example above

### Ok so lets try this again.

### A page that loads and compiles:

> myfact 0 = 1
> myfact n = n * n-1

Lets see if it works!


saved as a .lhs file I get

~/ghc $./inplace/bin/ghc-stage2 TheLitTest.lhs TheLitTest.lhs:1:2: lexical error at character '#'  Saving it as a .md file mutatis mutandis ### Ok so lets try this again. ### A page that loads and compiles:  module Main ( main ) where myfact 0 = 1 myfact n = n * n-1 main = undefined  Lets see if it works!  it compiles successfully but then other tools e.g. BlogLiterately don't work. They rely on the bird tracks. So I think this may solve a problem but it doesn't solve this problem. ### comment:15 Changed 5 years ago by elliottt I've had success using this with octopress, with this page as an example: I used haskell as the starting block, so that github would highlight the haskell blocks, though  also works. Do you know if BlogLiterally will process the fenced code blocks? I know that pandoc is happy to process them, so I figured that it should just work with anything that depends on that. Additionally, bird tracks in markdown are for quoted blocks, not code, so if that's how BlogLiterally is expecting to find the code, that could be a problem. Does BlogLiterally process .lhs files? If so, that could also be a problem, as markdown processing requires the .md extension, instead of .lhs. The reason for this is that .lhs processing allows CPP macros, but in markdown the # means a section header. The different extension signals different flags to unlit. ### comment:16 Changed 5 years ago by dominic With your patches for markdown and a .md file ### Ok so lets try this again. ### A page that loads and compiles:  module Main where myfact 0 = 1 myfact n = n * n-1 main = putStrLn (show (myfact 5))  Lets see if it works! [ghci] :!which ghc myfact 5  this compiles so hurrah! But BlogLiteraly does not a) do syntax highlighting b) evaluate "myfact 5" ~/ghc$ BlogLiteratelyD --ghci TheLitTest.md
<h3 id="ok-so-lets-try-this-again.">Ok so lets try this again.</h3>
<h3 id="a-page-that-loads-and-compiles">A page that loads and compiles:</h3>
<pre><code>module Main where

myfact 0 = 1
myfact n = n * n-1

main = putStrLn (show (myfact 5))</code></pre>
<p>Lets see if it works!</p>
<pre><code><span style="color: gray;">ghci&gt; </span>:!which ghc
/usr/local/bin/ghc

<span style="color: gray;">ghci&gt; </span>myfact 5</code></pre>
<div class="references">

</div>


On the other hand with a .lhs file

### Ok so lets try this again.

### A page that loads and compiles:

module Main where

> myfact 0 = 1
> myfact n = n * n-1

> main = putStrLn (show (myfact 5))

Lets see if it works!

[ghci]
:!which ghc
myfact 5


This does not compile

~/ghc $./inplace/bin/ghc-stage2 TheLitTest.lhs TheLitTest.lhs:1:2: lexical error at character '#'  But BlogLiterately produces code which is syntax highlighted and evaluates "myfact 5". ~/ghc$ BlogLiteratelyD --ghci TheLitTest.lhs
<h3 id="ok-so-lets-try-this-again.">Ok so lets try this again.</h3>
<h3 id="a-page-that-loads-and-compiles">A page that loads and compiles:</h3>
<p>module Main where</p>
<pre class="sourceCode haskell"><code class="sourceCode haskell"><span style="">&gt;</span> <span style="">myfact</span> <span class="hs-num">0</span> <span style="color: red;">=</span> <span class="hs-num">1</span>
<span style="">&gt;</span> <span style="">myfact</span> <span style="">n</span> <span style="color: red;">=</span> <span style="">n</span> <span style="">*</span> <span style="">n</span><span style="color: green;">-</span><span class="hs-num">1</span>
</code></pre>
<pre class="sourceCode haskell"><code class="sourceCode haskell"><span style="">&gt;</span> <span style="">main</span> <span style="color: red;">=</span> <span style="">putStrLn</span> <span style="color: red;">(</span><span style="">show</span> <span style="color: red;">(</span><span style="">myfact</span> <span class="hs-num">5</span><span style="color: red;">)</span><span style="color: red;">)</span>
</code></pre>
<p>Lets see if it works!</p>
<pre><code><span style="color: gray;">ghci&gt; </span>:!which ghc
/usr/local/bin/ghc

<span style="color: gray;">ghci&gt; </span>myfact 5
24
</code></pre>
<div class="references">

</div>


I've discussed this ticket (not BlogLiterately) with Simon Marlow and we concluded that we should try changing the order of unlit and cpp. There may be literate programs which have e.g. #ifdef in their literate (non-code / not in chevrons) block and these will now fail but we concluded that this is the correct behaviour (I hope I am not misquoting Simon here).

If this works then I think supporting .md files becomes more straightforward and BlogLiterately will work (and I will remove the workaround that Brent put in to make it handle # correctly when it calls ghci). Does that make sense?

### comment:17 Changed 5 years ago by elliottt

Moving CPP after unlit solves the problem with the sections defined by #, but it doesn't change the fact that in markdown, bird track blocks are actually quotations, not code [1].

What about this as a compromise (assuming my patch gets accepted): move unlit before CPP, to avoid problems with #, and keep the separate processing for .md/.markdown, allowing the distinction between bird tracks and code blocks in markdown. This way, you can write markdown in a .lhs file and use bird tracks for code blocks, and I can write haskell in a .md file using fenced code blocks, and still be able to write quotation blocks.

### comment:18 Changed 5 years ago by thoughtpolice

Milestone: 7.6.2 → 7.10.1

Moving to 7.10.1.

### comment:19 Changed 4 years ago by thoughtpolice

Milestone: 7.10.1 → 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

### comment:20 Changed 3 years ago by thoughtpolice

Milestone: 7.12.1 → 8.0.1

Milestone renamed

### comment:21 Changed 3 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.