This simple program in literate haskell, using markdown in the comments gives unlit problems:
### Ok so lets try this again.### A page that loads and compiles:> myfact 0 = 1 > myfact n = n * n-1Lets see if it works!
If I run unlit and collect the output I can see where it went wrong:
$ ~/lib/ghc-7.0.1/unlit Main.lhs Main.lpp$ cat Main.lpp### Ok so lets try this again.### A page that loads and compiles: myfact 0 = 1 myfact n = n * n-1
When I look through the source code of unlit.c I think the place to check for this would be here:
if ( c == '#' ) { if ( ignore_shebang ) { c1 = egetc(istream); if ( c1 == '!' ) { while (c=egetc(istream), !isLineTerm(c)) ; return SHEBANG; } myputc(c, ostream); c=c1; } if ( leavecpp ) { myputc(c, ostream); while (c=egetc(istream), !isLineTerm(c)) myputc(c,ostream); myputc('\n',ostream); return HASH; } }
I haven't tested it but, I think the cabal version would handle this case correctly (or be easier to fix than a C program from 1990). Would it be possible/wise/feasible to extract the cabal version and make it a permanent replacement for the current unlit.c code?
Left "### Ok so lets try this again.\n\n### A page that loads and compiles:\n\n myfact 0 = 1 \n myfact n = n * n-1\n\n -- Lets see if it works!\n\n"[[BR]]
I don't think that this is terribly surprising though, and shouldn't be to difficult to fix. If someone could please explain why CPP lines wouldn't be in code blocks (no matter how they are delimited) that would help a lot.
So the problem here is that ghc does unlit before cpp and so it has to pass the #cpp directives through. It has to do unlit before cpp because in the worst case the only time ghc finds out cpp is needed is when it encounters a {-# LANGUAGE CPP #-} pragma.
In principle I suppose that ghc could unlit with cpp passthrough only for the pass where it reads the module head to find pragmas, and then if cpp is not required to re-unlit the file without the cpp passthrough mode.
Technically this probably does count as H98 non-compliance. The CPP extension interferes with the use of # in ordinary (non-cpp) lhs files.
I've run into this problem when trying to use org-mode markup in a lhs file. This bug prevents me from using quite a lot of org-mode specific in-file settings. They all start with a '#' in the first column.
I've implemented markdown processing in a branch, and just discovered this ticket. Is there a reason that CPP is preserved in the comment part of a literate file? Shouldn't it be that CPP is only preserved when it shows up in either a birdtrack or \begin{code} ... \end{code} block? Preserving it in the comment portion of a literate program seems akin to processing CPP that is in the comments of a non-literate file; I would normally expect commented-out CPP to not be run.
I've updated my branch, and it's building successfully against master. I plan on testing a few corner cases, then submitting a patch. For reference, here's the branch:
I have probably misunderstood but I don't think this solves the problem. I have cherry-picked your commits. For the code example above
### Ok so lets try this again.### A page that loads and compiles:> myfact 0 = 1 > myfact n = n * n-1Lets see if it works!
saved as a .lhs file I get
~/ghc $ ./inplace/bin/ghc-stage2 TheLitTest.lhsTheLitTest.lhs:1:2: lexical error at character '#'
Saving it as a .md file mutatis mutandis
### Ok so lets try this again.### A page that loads and compiles:
module Main ( main ) where
myfact 0 = 1
myfact n = n * n-1
main = undefined
Lets see if it works!
it compiles successfully but then other tools e.g. BlogLiterately don't work. They rely on the bird tracks. So I think this may solve a problem but it doesn't solve this problem.
I used haskell as the starting block, so that github would highlight the haskell blocks, though also works.
Do you know if BlogLiterally will process the fenced code blocks? I know that pandoc is happy to process them, so I figured that it should just work with anything that depends on that. Additionally, bird tracks in markdown are for quoted blocks, not code, so if that's how BlogLiterally is expecting to find the code, that could be a problem.
Does BlogLiterally process .lhs files? If so, that could also be a problem, as markdown processing requires the .md extension, instead of .lhs. The reason for this is that .lhs processing allows CPP macros, but in markdown the # means a section header. The different extension signals different flags to unlit.
### Ok so lets try this again.### A page that loads and compiles:
module Main where
myfact 0 = 1
myfact n = n * n-1
main = putStrLn (show (myfact 5))
Lets see if it works! [ghci] :!which ghc myfact 5
this compiles so hurrah!
But BlogLiteraly does not a) do syntax highlighting b) evaluate "myfact 5"
~/ghc $ BlogLiteratelyD --ghci TheLitTest.md<h3 id="ok-so-lets-try-this-again.">Ok so lets try this again.</h3><h3 id="a-page-that-loads-and-compiles">A page that loads and compiles:</h3><pre><code>module Main wheremyfact 0 = 1myfact n = n * n-1main = putStrLn (show (myfact 5))</code></pre><p>Lets see if it works!</p><pre><code><span style="color: gray;">ghci> </span>:!which ghc /usr/local/bin/ghc<span style="color: gray;">ghci> </span>myfact 5</code></pre><div class="references"></div>
On the other hand with a .lhs file
### Ok so lets try this again.### A page that loads and compiles:module Main where> myfact 0 = 1> myfact n = n * n-1> main = putStrLn (show (myfact 5))Lets see if it works! [ghci] :!which ghc myfact 5
This does not compile
~/ghc $ ./inplace/bin/ghc-stage2 TheLitTest.lhsTheLitTest.lhs:1:2: lexical error at character '#'
But BlogLiterately produces code which is syntax highlighted and evaluates "myfact 5".
I've discussed this ticket (not BlogLiterately) with Simon Marlow and we concluded that we should try changing the order of unlit and cpp. There may be literate programs which have e.g. #ifdef in their literate (non-code / not in chevrons) block and these will now fail but we concluded that this is the correct behaviour (I hope I am not misquoting Simon here).
If this works then I think supporting .md files becomes more straightforward and BlogLiterately will work (and I will remove the workaround that Brent put in to make it handle # correctly when it calls ghci). Does that make sense?
Moving CPP after unlit solves the problem with the sections defined by #, but it doesn't change the fact that in markdown, bird track blocks are actually quotations, not code [1].
What about this as a compromise (assuming my patch gets accepted): move unlit before CPP, to avoid problems with #, and keep the separate processing for .md/.markdown, allowing the distinction between bird tracks and code blocks in markdown. This way, you can write markdown in a .lhs file and use bird tracks for code blocks, and I can write haskell in a .md file using fenced code blocks, and still be able to write quotation blocks.