literate markdown not handled correctly by unlit

changed weight to 5

In addition to the cabal version perhaps the perl script mentioned in the obscure unlit.c README reference (http://www.desy.de/user/projects/LitProg/glasgow/programs-and-options.html, lit2stuff) could be called with the correct options to remove the comments from literate Haskell files.

while it might be easier to fix the cabal program also handles the same test case incorrectly:

GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help[[BR]]

Loading package ghc-prim ... linking ... done.[[BR]]

Loading package integer-gmp ... linking ... done.[[BR]]

Loading package base ... linking ... done.[[BR]]

Prelude> :m Distribution.Simple.PreProcess.Unlit[[BR]]

Prelude Distribution.Simple.PreProcess.Unlit> f <- readFile "test.lhs"[[BR]]

Prelude Distribution.Simple.PreProcess.Unlit> f[[BR]]

"### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n> myfact 0 = 1 \n> myfact n = n * n-1\n\nLets see if it works!\n"[[BR]]

Prelude Distribution.Simple.PreProcess.Unlit> unlit "log.txt" f[[BR]]

Loading package array-0.3.0.0 ... linking ... done.[[BR]]

Loading package containers-0.3.0.0 ... linking ... done.[[BR]]

Loading package filepath-1.1.0.3 ... linking ... done.[[BR]]

Loading package old-locale-1.0.0.2 ... linking ... done.[[BR]]

Loading package old-time-1.0.0.3 ... linking ... done. [[BR]]

Loading package unix-2.4.0.0 ... linking ... done.[[BR]]

Loading package directory-1.0.1.0 ... linking ... done.[[BR]]

Loading package pretty-1.0.1.1 ... linking ... done.[[BR]]

Loading package process-1.0.1.2 ... linking ... done.[[BR]]

Loading package Cabal-1.8.0.2 ... linking ... done.[[BR]]

Left "### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n myfact 0 = 1 \n myfact n = n * n-1\n\n -- Lets see if it works!\n\n"[[BR]]

Prelude Distribution.Simple.PreProcess.Unlit> [[BR]]

I don't think that this is terribly surprising though, and shouldn't be to difficult to fix. If someone could please explain why CPP lines wouldn't be in code blocks (no matter how they are delimited) that would help a lot.

So the problem here is that ghc does unlit before cpp and so it has to pass the #cpp directives through. It has to do unlit before cpp because in the worst case the only time ghc finds out cpp is needed is when it encounters a {-# LANGUAGE CPP #-} pragma.

In principle I suppose that ghc could unlit with cpp passthrough only for the pass where it reads the module head to find pragmas, and then if cpp is not required to re-unlit the file without the cpp passthrough mode.

Technically this probably does count as H98 non-compliance. The CPP extension interferes with the use of # in ordinary (non-cpp) lhs files.

See also #4073 (closed) and #3719 (closed)

Thanks Duncan for pointing out one good reason why we need to do unlit before CPP.

changed milestone to %7.2.1

I've run into this problem when trying to use org-mode markup in a lhs file. This bug prevents me from using quite a lot of org-mode specific in-file settings. They all start with a '#' in the first column.

changed milestone to %7.6.1

changed weight to 3

Trac metadata

Trac field	Value
Priority	normal → low

mentioned in issue #7120 (closed)

Trac metadata

Trac field	Value
Related	- → #7120 (closed)

changed milestone to %7.6.2

I've implemented markdown processing in a branch, and just discovered this ticket. Is there a reason that CPP is preserved in the comment part of a literate file? Shouldn't it be that CPP is only preserved when it shows up in either a birdtrack or \begin{code} ... \end{code} block? Preserving it in the comment portion of a literate program seems akin to processing CPP that is in the comments of a non-literate file; I would normally expect commented-out CPP to not be run.

I've updated my branch, and it's building successfully against master. I plan on testing a few corner cases, then submitting a patch. For reference, here's the branch:

https://github.com/elliottt/ghc/tree/literate-markdown

I have probably misunderstood but I don't think this solves the problem. I have cherry-picked your commits. For the code example above

### Ok so lets try this again.

### A page that loads and compiles:

> myfact 0 = 1  
> myfact n = n * n-1

Lets see if it works!

saved as a .lhs file I get

~/ghc $ ./inplace/bin/ghc-stage2 TheLitTest.lhs

TheLitTest.lhs:1:2: lexical error at character '#'

Saving it as a .md file mutatis mutandis

### Ok so lets try this again.

### A page that loads and compiles:

module Main ( main ) where

myfact 0 = 1 myfact n = n * n-1

main = undefined


Lets see if it works!

it compiles successfully but then other tools e.g. BlogLiterately don't work. They rely on the bird tracks. So I think this may solve a problem but it doesn't solve this problem.

I've had success using this with octopress, with this page as an example:

https://github.com/elliottt/elliottt.github.com/blob/source/source/_posts/2013-02-19-serenade-in-haskell.markdown

I used haskell as the starting block, so that github would highlight the haskell blocks, though also works.

Do you know if BlogLiterally will process the fenced code blocks? I know that pandoc is happy to process them, so I figured that it should just work with anything that depends on that. Additionally, bird tracks in markdown are for quoted blocks, not code, so if that's how BlogLiterally is expecting to find the code, that could be a problem.

Does BlogLiterally process .lhs files? If so, that could also be a problem, as markdown processing requires the .md extension, instead of .lhs. The reason for this is that .lhs processing allows CPP macros, but in markdown the # means a section header. The different extension signals different flags to unlit.

With your patches for markdown and a .md file

### Ok so lets try this again.

### A page that loads and compiles:

module Main where

myfact 0 = 1 myfact n = n * n-1

main = putStrLn (show (myfact 5))


Lets see if it works!

    [ghci]
    :!which ghc
    myfact 5

this compiles so hurrah!

But BlogLiteraly does not a) do syntax highlighting b) evaluate "myfact 5"

~/ghc $ BlogLiteratelyD --ghci TheLitTest.md
<h3 id="ok-so-lets-try-this-again.">Ok so lets try this again.</h3>
<h3 id="a-page-that-loads-and-compiles">A page that loads and compiles:</h3>
<pre><code>module Main where

myfact 0 = 1
myfact n = n * n-1

main = putStrLn (show (myfact 5))</code></pre>
<p>Lets see if it works!</p>
<pre><code><span style="color: gray;">ghci&gt; </span>:!which ghc
  /usr/local/bin/ghc

<span style="color: gray;">ghci&gt; </span>myfact 5</code></pre>
<div class="references">

</div>

On the other hand with a .lhs file

### Ok so lets try this again.

### A page that loads and compiles:

module Main where

> myfact 0 = 1
> myfact n = n * n-1

> main = putStrLn (show (myfact 5))

Lets see if it works!

    [ghci]
    :!which ghc
    myfact 5

This does not compile

~/ghc $ ./inplace/bin/ghc-stage2 TheLitTest.lhs

TheLitTest.lhs:1:2: lexical error at character '#'

But BlogLiterately produces code which is syntax highlighted and evaluates "myfact 5".

~/ghc $ BlogLiteratelyD --ghci TheLitTest.lhs
<h3 id="ok-so-lets-try-this-again.">Ok so lets try this again.</h3>
<h3 id="a-page-that-loads-and-compiles">A page that loads and compiles:</h3>
<p>module Main where</p>
<pre class="sourceCode haskell"><code class="sourceCode haskell"><span style="">&gt;</span> <span style="">myfact</span> <span class="hs-num">0</span> <span style="color: red;">=</span> <span class="hs-num">1</span>
<span style="">&gt;</span> <span style="">myfact</span> <span style="">n</span> <span style="color: red;">=</span> <span style="">n</span> <span style="">*</span> <span style="">n</span><span style="color: green;">-</span><span class="hs-num">1</span>
</code></pre>
<pre class="sourceCode haskell"><code class="sourceCode haskell"><span style="">&gt;</span> <span style="">main</span> <span style="color: red;">=</span> <span style="">putStrLn</span> <span style="color: red;">(</span><span style="">show</span> <span style="color: red;">(</span><span style="">myfact</span> <span class="hs-num">5</span><span style="color: red;">)</span><span style="color: red;">)</span>
</code></pre>
<p>Lets see if it works!</p>
<pre><code><span style="color: gray;">ghci&gt; </span>:!which ghc
  /usr/local/bin/ghc

<span style="color: gray;">ghci&gt; </span>myfact 5
  24
</code></pre>
<div class="references">

</div>

I've discussed this ticket (not BlogLiterately) with Simon Marlow and we concluded that we should try changing the order of unlit and cpp. There may be literate programs which have e.g. #ifdef in their literate (non-code / not in chevrons) block and these will now fail but we concluded that this is the correct behaviour (I hope I am not misquoting Simon here).

If this works then I think supporting .md files becomes more straightforward and BlogLiterately will work (and I will remove the workaround that Brent put in to make it handle # correctly when it calls ghci). Does that make sense?

Moving CPP after unlit solves the problem with the sections defined by #, but it doesn't change the fact that in markdown, bird track blocks are actually quotations, not code [1].

What about this as a compromise (assuming my patch gets accepted): move unlit before CPP, to avoid problems with #, and keep the separate processing for .md/.markdown, allowing the distinction between bird tracks and code blocks in markdown. This way, you can write markdown in a .lhs file and use bird tracks for code blocks, and I can write haskell in a .md file using fenced code blocks, and still be able to write quotation blocks.

[1] http://daringfireball.net/projects/markdown/syntax#blockquote

changed milestone to %7.10.1

Moving to 7.10.1.

removed milestone

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

changed milestone to %8.0.1

Milestone renamed

added Plow label

Trac field	Value
Version	7.0.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC	dagitj@gmail.com
Operating system
Architecture

literate markdown not handled correctly by unlit

Child items ...

Activity

literate markdown not handled correctly by unlit

Relates to

Activity